PHP - Downloading very large files with fsockopen(), fgets() and feof()

PHP - Downloading very large files with fsockopen(), fgets() and feof() - php

I have a simple download function in a class that might be dealing with files of many hundreds of megabytes at a time from an Amazon Web Services bucket. The whole file cannot be loaded into memory at once, so it must be streamed directly to a file pointer. This is my understanding as this is the first time I've dealt with this issue and I'm picking things up as I go along.
I've ended up with this, based on a 4 KB file buffer which simple testing showed was a good size:
$fs = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
} else {
$out = "GET $file HTTP/1.1\r\n";
$out .= "Host: $host\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fs, $out);
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs) && ($debug = fgets($fs)) != "\r\n" ); // ignore headers
while(!feof($fs)) {
$contents = fgets($fs, 4096);
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED - Connection timed out: ", $temp_file_name);
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/', $temp_file_name);
rename($temp_file_name, $media_file_name);
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
In testing it's fine. However I have got into a conversation with someone who is saying that I am not understanding how fgets() and feof() work together, and he's mentioning chunked encoding as a more efficient method.
Is the code generally OK, or am I missing something vital here? What is the benefit that chunked encoding will give me?

Your solution seems fine to me, however I have a few comments.
1) Don't create a HTTP packet yourself, i.e. don't send the HTTP request. Instead use something like CURL. This is more fool proof and will support a wider range of responses the server might reply with. Additionally CURL can be setup to write directly to a file, saving you doing it yourself.
2) Using fgets may be a problem if you are reading binary data. Fgets reads to the end of a line, and with binary data this may corrupt your download. Instead I suggest fread($fs, 4096); which will handle both text and binary data.
2) Chunked encoding is a way for a webserver to send you the response in multiple chunks. I don't think this is very useful to you, however, a better encoding that the webserver might support is the gzip encoding. This would allow the webserver to compress the response on the fly. If you use a library like CURL, it will tell the server it supports gzip, and then automatically decompress it for you.
I hope this helps

Don't deal with sockets, optimize your code and use the cURL library, PHP cURL. Like this:
$url = 'http://'.$host.'/'.$file;
// create a new cURL resource
$fh = fopen ($temp_file_name, "w");
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FILE, $fh);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
fclose($fh);

And the final result in case it helps anyone else. I also wrapped the whole thing in a retry loop to decrease the risk of a completely failed download, but it does increase the use of resources:
do {
$fs = fopen('http://' . $host . $file, "rb");
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
} else {
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs)) {
$contents = fread($fs, 4096); // Buffered download
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED on attempt " . $download_attempt . " - Connection timed out: ", $temp_file_name);
$download_attempt++;
if ($download_attempt < 5) {
$this->writeDebugInfo("RETRYING: ", $temp_file_name);
}
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/', $temp_file_name);
rename($temp_file_name, $media_file_name);
$this->newDownload = true;
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
} while ($download_attempt < 5 && $info['timed_out']);

Related

Using fread to send chunks of large data

This is a bit of a continue from my previous thread (PHP CURL Chunked encoding a large file (700mb)) but I've now improvised something else.
Right now, I'm trying to use fread and then sending files through CURL chunk by chunk (each chunk around 1MB) and while the idea is good and it does work. It does timeout the server, so I was wondering if there was any way to reduce the amount of times it sends a chunk per second or a way to make it so it doesn't completely overload my PHP process.
$length = (1024 * 1024) * 1;
$handle = fopen($getFile, "r");
while (($buffer = fread($handle, $length)) !== false) {
if ($response = sendChunk($getServer, $buffer)) {
$chunk++;
print "Chunk " . $chunk . " Sent (Code: " . $response . ")! \n";
}
}
The function sendChunk is
function sendChunk($url, $chunk) {
$POST_DATA = [
'file' => base64_encode($chunk)
];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_TIMEOUT, 2048);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $POST_DATA);
curl_exec($curl);
$response = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close ($curl);
return $response;
}
I tried making it so you can read the file line by line, but it doesn't work since a video file (mp4, wmv) is lots of random characters and what not.
UPDATE: I have discovered the issue and the timing out was actually a result of CloudFlare timing out when there's no such HTTP Response. So I decided to run the script using SSH and it worked fine .... except for one thing.
After the file does get successfully sent over it will just keep sending chunks of 0 bytes in this endless loop and I was told it was because feof() isn't always accurate in measuring that. So I tried using the ($buffer = fread($handle, $length) !== false) trick and it still repeats the same thing. Any ideas?

After working on this for around 8 hours, I noticed that I wasn't using $buffer to send the chunk so now I have done that.
while (!feof($fp) && ($buffer = fread($handle, $length)) !== false) {
if ($response = sendChunk($getServer, $buffer)) {
$chunk++;
print "Chunk " . $chunk . " Sent (Code: " . $response . ")! \n";
}
}
Everything works fine, I did some other touchups like check for a response code of 200. But the core of it works.
A lesson for anyone that is using Cloudflare and is wanting to transfer a file (Up to 2GB) to another server and want to use it via CURL.
There is better ways than just using CURL for this in my opinion, but client has requested it done via this way, but it works.
Cloudflare only has a maximum upload limit of 250MB for Free users, you cannot do chunked uploading through CURL's supported stream function as Cloudflare still reads it as > 250MB in the header.
When I managed to get this code to work, it would timeout on certain chunks and it was because Cloudflare needs an HTTP header within 100 seconds or it times out. Thankfully my script will be executed via CRON so it doesn't need to go through Cloudflare to work. However, if you are looking to execute code within the browser then you may want to take a look at this. https://github.com/marcialpaulg/Fixing-Cloudflare-Error-524

Read to the end of an XML response when the server isn't specifying the end of file?

I'm writing a script that communicates with a server via XML. I can tell I'm making successful requests to the server's API because I can see in a log on the server it's receiving them, however I'm having a hard time receiving the response (XML). I do not own the server and unfortunately cannot modify any of the programs sending the response.
I don't think the server is specifying the end of the file, so doing a while (!feof($fp)) { ... } hangs. And unfortunately I don't think I have any way (to my knowledge) of determining the size of the response before reading it.
What I am doing and what I have attempted:
function postXMLSocket ($server, $path, $port, $xmlDocument) {
$contentLength = strlen($xmlDocument);
$result = '';
// Handling error case in else statement below
if ($fp = #fsockopen($server, $port, $errno, $errstr, 30)) {
$out = "POST / HTTP/1.0\r\n";
$out .= "Host: ".$server."\r\n";
$out .= "Content-Type: text/xml\r\n";
$out .= "Content-Length: ".$contentLength."\r\n";
$out .= "Connection: close\r\n";
$out .= "\r\n"; // all headers sent
$out .= $xmlDocument;
fwrite($fp, $out);
// ATTEMPT 5: Read until we have a valid XML doc -- hangs
// libxml_use_internal_errors(true);
// do {
// $result .= fgets($fp, 128);
// $xmlTest = simplexml_load_string($result);
// } while ($xmlTest === false);
// ATTEMPT 4: Read X # of lines -- works but I can't know how many lines response will be
// for ($i = 0; $i < 10; $i++) {
// $result .= fgets($fp, 128);
// }
// ATTEMPT 3: Read until the lines being read are empty -- hangs
// do {
// $lineRead = fgets($fp, 500);
// $result .= $lineRead;
// } while (strlen($lineRead) > 0);
// ATTEMPT 2: Read the whole file w/ fread -- only reads part of file
// $result = fread($fp, 8192);
// ATTEMPT 1: Read to the EOF -- hangs
// while (!feof($fp)) {
// $result .= fgets($fp, 128);
// }
fclose($fp);
}
else {
// Could not connect to socket
return false;
}
return $result;
}
Attempt descriptions:
1) First I just tried reading lines until reaching the end of the file. This keeps hanging and resulting in a time out and I think it's because the server isn't marking the end of the XML file it's responding with, so it's getting caught in an infinite loop.
2) Second I tried to read response as one whole file. This worked and I got something back, but it was incomplete (seems the response is quite large). While this works, I don't have any way of knowing how big the response will be before reading it, so I don't think this is an option.
3) Next I tried reading until fgets is returning an empty string. I made the assumption it would do this if it's reading lines after passing the end of the file, but this hangs as well.
4) For this attempt I just tried to read a hardcoded number of lines (10 in this case), but this has similar problems to attempt 2 above where I can't accurately know how many lines the response will have until after reading it.
5) This is where I thought I was getting clever. I know the response will be XML, and will be contained in a <Response> node. Therefore I thought I could get away with reading until the $result variable contained a valid XML string, however this seems to hang as well.

Using a higher level approach to HTTP requests will probably help you. Try this:
$stringWithSomeXml = "your payload xml here";
postXml("www.google.com", "/path/on/server", 80, $stringWithSomeXml);
function postXml($server, $path, $port, $xmlPayload)
{
$ch = curl_init();
$path = ltrim($path, "/");
if ($port == 80) {
$url = "https://{$server}/{$path}";
} else {
$url = "https://{$server}:{$port}/{$path}";
}
echo "\n$url\n";
curl_setopt(
$ch,
CURLOPT_URL,
$url
);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt(
$ch,
CURLOPT_HTTPHEADER,
[
"Content-type: application/xml",
"Content-Length: ".strlen($xmlPayload)
]
);
curl_setopt($ch, CURLOPT_POSTFIELDS, $xmlPayload);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$result = curl_exec($ch);
echo "length: " . strlen($result) . "\n";
echo "content: " . $result . "\n";
curl_close($ch);
}

PHP fsockopen client doesn't receive sent data

I have the following (stripped-down) piece of code:
function curl_request_async($url, $params)
{
foreach ($params as $key => $val) {
$post_params[] = $key.'='.urlencode($val);
}
$post_string = implode('&', $post_params);
$parts=parse_url($url);
$fp = fsockopen($parts['host'],
isset($parts['port'])?$parts['port']:80,
$errno, $errstr, 30);
fwrite($fp, "$type ".$parts['path']." HTTP/1.1\r\n");
fwrite($fp, "Host: ".$parts['host']."\r\n");
fwrite($fp, "Content-Type: application/x-www-form-urlencoded\r\n");
fwrite($fp, "Content-Length: ".strlen($post_string)."\r\n");
fwrite($fp, "Connection: Close\r\n\r\n");
$bytes_written = fwrite($fp, $post_string);
var_dump($bytes_written, strlen($post_string));
// fread($fp, 1);
// fflush($fp);
fclose($fp);
}
The problem with this code is that I found no evidence the request reached the server called. The line var_dump($bytes_written, strlen($post_string)); outputted int(493) int(493), so it should have received all data, yet it didn't.
If I uncomment fread($fp, 1); it works without a problem. That could be working solution, but it doesn't seem to make sense. There has to be a better way!
My question then is two-fold: why does fread($fp, 1); fix my problem and is there a better solution?

your problem is probably that you wrote the server code in PHP, and you dont have ignore_user_abort=true by default (see http://php.net/manual/en/misc.configuration.php#ini.ignore-user-abort ), so when you close the connection, your server stop executing your php code, thus fread(fp,1) fix your problem - connection dont close before php start writing a response
you can use this code to make a server to test if its actually connecting or not -
<?php
error_reporting(E_ALL);
ini_set('display_errors',1);
$sck=socket_create(AF_INET,SOCK_STREAM,SOL_TCP);
if($sck===FALSE){
die('socket_create failed!');
}
if(!socket_set_block($sck)){
die("socket_set_block failed!");
}
if(!socket_bind($sck, '0.0.0.0',1337)){
die("FAILED to bind to port 1337");
}
if(!socket_listen($sck,0)){
die("socket_listen failed!");
}
$fullFile='';
while((print('listening for connections!'.PHP_EOL)) && false!==($conn=socket_accept($sck))){
echo "new connection!".PHP_EOL;
echo "generating crypto iv..";
while(false!==($buffi=socket_recv($conn,$buff,1024,MSG_WAITALL))){
if($buffi===0){
break;//socket_recv's way of
//saying that the connection closed,
//apparently. the docs say it should return
// false, but it doesn't, it just infinitely returns int(0).
// at least on windows 7 x64 sp1.
}
$fullFile.=$buff;
echo "recieved ".strlen($fullFile)." bytes...".PHP_EOL;
$buff='';//do i need to clear it? or wiill recv do it for me?
}
echo "all bytes recieved (i guess, todo, socket_last_error confirm).";
echo PHP_EOL;var_dump($fullFule);
echo "done!".PHP_EOL;
}
die("should never reach this code...");
it will make a netcat-style server on http://127.0.0.1:1337

fread needs two parameters: a resource and a length number of bytes to read.
Right now you are only reading 1 byte. fread($fp, 1);
If you want to read the complete result, loop it until readed completely:
while(!feof($fp)){
echo fread($fp, 128);
}

PHP: Get metadata of a remote .mp3 file

I am looking for a function that gets the metadata of a .mp3 file from a URL (NOT local .mp3 file on my server).
Also, I don't want to install http://php.net/manual/en/id3.installation.php or anything similar to my server.
I am looking for a standalone function.
Right now i am using this function:
<?php
function getfileinfo($remoteFile)
{
$url=$remoteFile;
$uuid=uniqid("designaeon_", true);
$file="../temp/".$uuid.".mp3";
$size=0;
$ch = curl_init($remoteFile);
//==============================Get Size==========================//
$contentLength = 'unknown';
$ch1 = curl_init($remoteFile);
curl_setopt($ch1, CURLOPT_NOBODY, true);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch1);
curl_close($ch1);
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
$size=$contentLength;
}
//==============================Get Size==========================//
if (!$fp = fopen($file, "wb")) {
echo 'Error opening temp file for binary writing';
return false;
} else if (!$urlp = fopen($url, "r")) {
echo 'Error opening URL for reading';
return false;
}
try {
$to_get = 65536; // 64 KB
$chunk_size = 4096; // Haven't bothered to tune this, maybe other values would work better??
$got = 0; $data = null;
// Grab the first 64 KB of the file
while(!feof($urlp) && $got < $to_get) { $data = $data . fgets($urlp, $chunk_size); $got += $chunk_size; } fwrite($fp, $data); // Grab the last 64 KB of the file, if we know how big it is. if ($size > 0) {
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RESUME_FROM, $size - $to_get);
curl_exec($ch);
// Now $fp should be the first and last 64KB of the file!!
#fclose($fp);
#fclose($urlp);
} catch (Exception $e) {
#fclose($fp);
#fclose($urlp);
echo 'Error transfering file using fopen and cURL !!';
return false;
}
$getID3 = new getID3;
$filename=$file;
$ThisFileInfo = $getID3->analyze($filename);
getid3_lib::CopyTagsToComments($ThisFileInfo);
unlink($file);
return $ThisFileInfo;
}
?>
This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded.
Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
To make my question clear : I need a fast standalone function that reads metadata of a remote URL .mp3 file.

This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded. Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
Yeah, well what do you propose? How do you expect to get data if you don't get data? There is no way to have a generic remote HTTP server send you that ID3 data. Really, there is no magic. Think about it.
What you're doing now is already pretty solid, except that it doesn't handle all versions of ID3 and won't work for files with more than 64KB of ID3 tags. What I would do to improve it to is to use multi-cURL.
There are several PHP classes available that make this easier:
https://github.com/jmathai/php-multi-curl
$mc = EpiCurl::getInstance();
$results[] = $mc->addUrl(/* Your stream URL here /*); // Run this in a loop, 10 at a time or so
foreach ($results as $result) {
// Do something with the data.
}

How to speed up file_get_contents?

Here's my code:
$language = $_GET['soundtype'];
$word = $_GET['sound'];
$word = urlencode($word);
if ($language == 'english') {
$url = "<the first url>";
} else if ($language == 'chinese') {
$url = "<the second url>";
}
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: <my user agent>"
)
);
$context = stream_context_create($opts);
$page = file_get_contents($url, false, $context);
header('Content-Type: audio/mpeg');
echo $page;
But I've found that this runs terribly slow.
Are there any possible methods of optimization?
Note: $url is a remote url.

It's slow because file_get_contents() reads the entire file into $page, PHP waits for the file to be received before outputting the content. So what you're doing is: downloading the entire file on the server side, then outputting it as a single huge string.
file_get_contents() does not support streaming or grabbing offsets of the remote file. An option is to create a raw socket with fsockopen(), do the HTTP request, and read the response in a loop, as you read each chunk, output it to the browser. This will be faster because the file will be streamed.
Example from the Manual:
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
header('Content-Type: audio/mpeg');
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
The above is looping while there is still content available, on each iteration it reads 128 bytes and then outputs it to the browser. The same principle will work for what you're doing. You'll need to make sure that you don't output the response HTTP headers which will be the first few lines, because since you are doing a raw request, you will get the raw response with headers included. If you output the response headers you will end up with a corrupt file.

Instead of downloading the whole file before outputting it, consider streaming it out like this:
$in = fopen($url, 'rb', false, $context);
$out = fopen('php://output', 'wb');
header('Content-Type: video/mpeg');
stream_copy_to_stream($in, $out);
If you're daring, you could even try (but that's definitely experimental):
header('Content-Type: video/mpeg');
copy($url, 'php://output');
Another option is using internal redirects and making your web server proxy the request for you. That would free up PHP to do something else. See also my post regarding X-Sendfile and friends.

As explained by #MrCode, first downloading the file to your server, then passing it on to the client will of course incur a doubled download time. If you want to pass the file on to the client directly, use readfile.
Alternatively, think about if you can't simply redirect the client to the file URL using a header("Location: $url") so the client can get the file directly from the source.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - Downloading very large files with fsockopen(), fgets() and feof() - php

Related

Using fread to send chunks of large data

Read to the end of an XML response when the server isn't specifying the end of file?

PHP fsockopen client doesn't receive sent data

PHP: Get metadata of a remote .mp3 file

How to speed up file_get_contents?

Categories

Resources