PHP: fseek() for large file (>2GB) - php

I have a very large file (about 20GB), how can I use fseek() to jump around and read its content.
The code looks like this:
function read_bytes($f, $offset, $length) {
fseek($f, $offset);
return fread($f, $length);
}
The result is only correct if $offset < 2147483647.
Update: I am running on windows 64,
phpinfo - Architecture: x64,
PHP_INT_MAX: 2147483647

WARNING: as noted in comments, fseek uses INT internally and it simply cant work
with such large files on 32bit PHP compilations. Following solution
wont work. It is left here just for reference.
a little bit of searching led me to comments on PHP manual page for fseek:
http://php.net/manual/en/function.fseek.php
problem is maximum int size for offset parameter but seems that you can work around it by doing multiple fseek calls with SEEK_CUR option and mix it with one of big numbers processing library.
example:
function fseek64(&$fh, $offset)
{
fseek($fh, 0, SEEK_SET);
$t_offset = '' . PHP_INT_MAX;
while (gmp_cmp($offset, $t_offset) == 1)
{
$offset = gmp_sub($offset, $t_offset);
fseek($fh, gmp_intval($t_offset), SEEK_CUR);
}
return fseek($fh, gmp_intval($offset), SEEK_CUR);
}
fseek64($f, '23456781232');

for my project, i needed to READ blocks of 10KB from a BIG offset in a BIG file (>3 GB). Writes were always append, so no offsets needed.
this will work, irrespective of which PHP version and OS you are using.
Pre-requisite = your server should support Range-retrieval queries. Apache & IIS already support this, as do 99% of other webservers (shared hosting or otherwise)
// offset, 3GB+
$start=floatval(3355902253);
// bytes to read, 100 KB
$len=floatval(100*1024);
// set up the http byte range headers
$opts = array('http'=>array('method'=>'GET','header'=>"Range: bytes=$start-".($start+$len-1)));
$context = stream_context_create($opts);
// bytes ranges header
print_r($opts);
// change the URL below to the URL of your file. DO NOT change it to a file path.
// you MUST use a http:// URL for your file for a http request to work
// this will output the results
echo $result = file_get_contents('http://127.0.0.1/dir/mydbfile.dat', false, $context);
// status of your request
// if this is empty, means http request didnt fire.
print_r($http_response_header);
// Check your file URL and verify by going directly to your file URL from a web
// browser. If http response shows errors i.e. code > 400 check you are sending the
// correct Range headers bytes. For eg - if you give a start Range which exceeds the
// current file size, it will give 406.
// NOTE - The current file size is also returned back in the http response header
// Content-Range: bytes 355902253-355903252/355904253, the last number is the file size
...
...
...
SECURITY - you must add a .htaccess rule which denies all requests for this database file except those coming from local ip 127.0.0.1.

Related

Apache does not work with HTTP_RANGE?

I wanted to create a .php file, that streams a video!
Now, the problem is, that it works, only if i use a normal readfile(), but then, you can not go back and forward in the video, so i searched on google, to find this code:
(basically, the HTTP_RANGE does not work, NEVER, i do not know why, when testing it, it always fires my die("lol?");, so it clearly does not support it for some reason)
(the die() function is left there on purpose, it will be taken out if it would work..)
(note that i changed "$size = filesize($file);" to "$size = filesize(".".$file);", because someone mentioned that this is required, and "filesize($file);" does not work for me anyways, it always fires an error)!
(and, the $file, shows the actual path for my file, nothing replaced, its how it looks in the original php of me!)
<?php
// Clears the cache and prevent unwanted output
ob_clean();
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
apache_setenv('no-gzip', 1);
ini_set('zlib.output_compression', 'Off');
$file = "/cdn4-e663/zw4su8jiy8skgvizihsjehj/2038tkusi9u848sui7zh/2q3z6hjk97ujduz/a1-cdn/9zw35jbmhkk47wi63uu7.mp4"; // The media file's location
$mime = "application/octet-stream"; // The MIME type of the file, this should be replaced with your own.
$size = filesize(".".$file); // The size of the file
// Send the content type header
header('Content-type: ' . $mime);
// Check if it's a HTTP range request
if(isset($_SERVER['HTTP_RANGE'])){
// Parse the range header to get the byte offset
$ranges = array_map(
'intval', // Parse the parts into integer
explode(
'-', // The range separator
substr($_SERVER['HTTP_RANGE'], 6) // Skip the `bytes=` part of the header
)
);
// If the last range param is empty, it means the EOF (End of File)
if(!$ranges[1]){
$ranges[1] = $size - 1;
}
// Send the appropriate headers
header('HTTP/1.1 206 Partial Content');
header('Accept-Ranges: bytes');
header('Content-Length: ' . ($ranges[1] - $ranges[0])); // The size of the range
// Send the ranges we offered
header(
sprintf(
'Content-Range: bytes %d-%d/%d', // The header format
$ranges[0], // The start range
$ranges[1], // The end range
$size // Total size of the file
)
);
// It's time to output the file
$f = fopen($file, 'rb'); // Open the file in binary mode
$chunkSize = 8192; // The size of each chunk to output
// Seek to the requested start range
fseek($f, $ranges[0]);
die("working?");
// Start outputting the data
while(true){
// Check if we have outputted all the data requested
if(ftell($f) >= $ranges[1]){
break;
}
// Output the data
echo fread($f, $chunkSize);
// Flush the buffer immediately
ob_flush();
flush();
}
}
else {
die("lol?");
header('Content-Length: ' . $size);
// Read the file
readfile($file);
// and flush the buffer
ob_flush();
flush();
}
?>
so, the die("lol?"); was added by me to see if the
if(isset($_SERVER['HTTP_RANGE'])){
/*function fires or not, and no, as it seems it returns FALSE every time..8/
}
so i wanted to ask you all, how can i fix this? i really want to use php to stream my video, because of security reasons, and because i like it, i already use this methode with images but its a different code(and working)!
I am using Apache 2.4 (Windows 10 - 64bit PC) with the latest version of PHP7, but it seems that apache does not support HTTP_RANGE? am i missing something, is there something i need to enable inside either the php.ini or the httpd.conf??
Thank you in advance, i hope someone can tells me what to do, because i really am stuck here, and i tried ALL examples of mp4 video streaming i could find on google, and none worked for me :/
There are 2 parts to this:
The request made by the browser/client. This must send appropriate request headers.
The response given by your server. This is done by your PHP script and must also send the appropriate response headers
When you try and stream your video (or whatever the content is) open the Network tab in your browser.
Look at the Request Headers (in Chrome this is under the Network tab). I've posted a screenshot below. Note that in the request there is a Range: parameter. If this is not present in the request, you'll have problems. This is what tells the PHP script on the server that you are doing a range request in the first place. If the server does not see this header in the request then it will just bypass the if statement and go into the die.
Note that the Range: request header is not normally included in requests by default, so unless you are specifying this, it will never work. If you don't see it in the Request Headers on your Network tab, it is not present, and you need to fix that.
You may also want to examine the response headers - which are totally different from the request headers. Again, these can be seen in the Network tab in your browser. See below for the appropriate headers that must be set:
Going back to the original question, none of it has anything to do with the response (which is what you were describing). The initial problem you are having is all to do with how you're making the request and the fact it does not contain a Range: header, when it must do so.

uploading large object to Cloudfiles returns different md5

So I have this code and I'm trying to upload large files as per https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/Storage/Object.md to Rackspace:
$src_path = 'pathtofile.zip'; //about 700MB
$md5_checksum = md5_file($src_path); //result is f210775ccff9b0e4f686ea49ac4932c2
$trans_opts = array(
'name' => $md5_checksum,
'concurrency' => 6,
'partSize' => 25000000
);
$trans_opts['path'] = $src_path;
$transfer = $container->setupObjectTransfer($trans_opts);
$response = $transfer->upload();
Which allegedly uploads the file just fine
However when I try to download the file as recommended here https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/USERGUIDE.md:
$name = 'f210775ccff9b0e4f686ea49ac4932c2';
$object = $container->getObject($name);
$objectContent = $object->getContent();
$pathtofile = 'destinationpathforfile.zip';
$objectContent->rewind();
$stream = $objectContent->getStream();
file_put_contents($pathtofile, $stream);
$md5 = md5_file($pathtofile);
The result of md5_file ends up being different from 'f210775ccff9b0e4f686ea49ac4932c2'....moreover the downloaded zip ends up being unopenable/corrupted
What did I do wrong?
It's recommended that you only use multipart uploads for files over 5GB. For files under this threshold, you can use the normal uploadObject method.
When you use the transfer builder, it segments your large file into smaller segments (you provide the part size) and concurrently uploads each one. When this process has finished, a manifest file is created which contains a list of all these segments. When you download the manifest file, it collates them all together, effectively pretending to be the big file itself. But it's just really an organizer.
To get back to answering your question, the ETag header of a manifest file is not calculated how you may think. What you're currently doing is taking the MD5 checksum of the entire 700MB file, and comparing it against the MD5 checksum of the manifest file. But these aren't comparable. To quote the documentation:
the ETag header is calculated by taking the ETag value of each segment, concatenating them together, and then returning the MD5 checksum of the result.
There are also downsides to using this DLO operation that you need to be aware of:
End-to-end integrity is not assured. The eventual consistency model means that although you have uploaded a segment object, it might not appear in the container list immediately. If you download the manifest before the object appears in the container, the object will not be part of the content returned in response to a GET request.
If you think there's been an error in transmission, perhaps it's because a HTTP request failed along the way. You can use retry strategies (using the backoff plugin) to retry failed requests.
You can also turn on HTTP logging to check every network transaction to help with debugging. Be careful, though, using the above with echo out the HTTP request body (>25MB) into STDOUT. You might want to use this instead:
use Guzzle\Plugin\Log\LogPlugin;
use Guzzle\Log\ClosureLogAdapter;
$stream = fopen('php://output', 'w');
$logSubscriber = new LogPlugin(new ClosureLogAdapter(function($m) use ($stream) {
fwrite($stream, $m . PHP_EOL);
}), "# Request:\n{url} {method}\n\n# Response:\n{code} {phrase}\n\n# Connect time: {connect_time}\n\n# Total time: {total_time}", false);
$client->addSubscriber($logSubscriber);
As you can see, you're using a template to dictate what's outputted. There's a full list of template variables here.

PHP filesize of dynamically chosen file

I have a php script that needs to determine the size of a file on the file system after being manipulated by a separate php script.
For example, there exists a zip file that has a fixed size but gets an additional file of unknown size inserted into it based on the user that tries to access it. So the page that's serving the file is something like getfile.php?userid=1234.
So far, I know this:
filesize('getfile.php'); //returns the actual file size of the php file, not the result of script execution
readfile('getfile.php'); //same as filesize()
filesize('getfile.php?userid=1234'); //returns false, as it can't find the file matching the name with GET vars attached
readfile('getfile.php?userid=1234'); //same as filesize()
Is there a way to read the result size of the php script instead of just the php file itself?
filesize
As of PHP 5.0.0, this function can also be used with some URL
wrappers.
something like
filesize('http://localhost/getfile.php?userid=1234');
should be enough
Someone had posted an option for using curl to do this but removed their answer after a downvote. Too bad, because it's the one way I've gotten this to work. So here's their answer that worked for me:
$ch = curl_init('http://localhost/getfile.php?userid=1234');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //This was not part of the poster's answer, but I needed to add it to prevent the file being read from outputting with the requesting script
curl_exec($ch);
$size = 0;
if(!curl_errno($ch))
{
$info = curl_getinfo($ch);
$size = $info['size_download'];
}
curl_close($ch);
echo $size;
The only way to get the size of the output is to run it and then look. Depending on the script the result might differ though for practical use the best thing to do is to estimate basd on your knowledge. i.e. if you have a 5MB file and add another 5k user specific content it's still about 5MB in the end etc.
To expand on Ivan's answer:
Your string is 'getfile.php' with or without GET parameters, this is being treated as a local file, and therefore retrieving the filesize of the php file itself.
It is being treated as a local file because it isn't starting with the http protocol. See http://us1.php.net/manual/en/wrappers.php for supported protocols.
When using filesize() I got a warning:
Warning: filesize() [function.filesize]: stat failed for ...link... in ..file... on line 233
Instead of filesize() I found two working options to replace it:
1)
$headers = get_headers($pdfULR, 1);
$fileSize = $headers['Content-Length'];
echo $fileSize;
2)
echo strlen(file_get_contents($pdfULR));
Now it's working fine.

Issue to determine a currently downloading file size?

I have an interesting problem. I need to do a progress bar from an asycronusly php file downloading. I thought the best way to do it is before the download starts the script is making a txt file which is including the file name and the original file size as well.
Now we have an ajax function which calling a php script what is intended to check the local file size. I have 2 main problems.
files are bigger then 2GB so filesize() function is out of business
i tried to find a different way to determine the local file size like this:
.
function getSize($filename) {
$a = fopen($filename, 'r');
fseek($a, 0, SEEK_END);
$filesize = ftell($a);
fclose($a);
return $filesize;
}
Unfortunately the second way giving me a tons of error assuming that i cannot open a file which is currently downloading.
Is there any way i can check a size of a file which is currently downloading and the file size will be bigger then 2 GB?
Any help is greatly appreciated.
I found the solution by using an exec() function:
exec("ls -s -k /path/to/your/file/".$file_name,$out);
Just change your OS and PHP to support 64 bit computing. and you can still use filesize().
From filesize() manual:
Return Values
Returns the size of the file in bytes, or FALSE (and generates an
error of level E_WARNING) in case of an error.
Note: Because PHP's integer type is signed and many platforms use
32bit integers, some filesystem functions may return unexpected
results for files which are larger than 2GB.

Cannot resume downloads bigger than 300M

I am working on a program with php to download files.
the script request is like: http://localhost/download.php?file=abc.zip
I use some script mentioned in Resumable downloads when using PHP to send the file?
it definitely works for files under 300M, either multithread or single-thread download, but, when i try to download a file >300M, I get a problem in single-thread downloading, I downloaded only about 250M data, then it seems like the http connection is broken. it doesnot break in the break-point ..Why?
debugging the script, I pinpointed where it broke:
$max_bf_size = 10240;
$pf = fopen("$file_path", "rb");
fseek($pf, $offset);
while(1)
{
$rd_length = $length < $max_bf_size? $length:$max_bf_size;
$data = fread($pf, $rd_length);
print $data;
$length = $length - $rd_length;
if( $length <= 0 )
{
//__break-point__
break;
}
}
this seems like every requested document can only get 250M data buffer to echo or print..But it works when i use a multi-thread to download a file
fread() will read up to the number of bytes you ask for, so you are doing some unnecessary work calculating the number of bytes to read. I don't know what you mean by single-thread and multi-thread downloading. Do you know about readfile() to just dump an entire file? I assume you need to read a portion of the file starting at $offset up to $length bytes, correct?
I'd also check my web server (Apache?) configuration and ISP limits if applicable; your maximum response size or time may be throttled.
Try this:
define(MAX_BUF_SIZE, 10240);
$pf = fopen($file_path, 'rb');
fseek($pf, $offset);
while (!feof($pf)) {
$data = fread($pf, MAX_BUF_SIZE);
if ($data === false)
break;
print $data;
}
fclose($pf);

Categories