stream audio/video files from gridFS on the browser - php

I have been trying to read an audio file from mongoDB which i have stored using GridFS. I could download the file in the system and play from it but I wanted to stream those audio/video files from the DB itself and play it in the browser. Is there anyway to do that without downloading the file to the system? Any help would be good.

The PHP GridFS support has a MongoGridFSFile::getResource() function that allows you to get the stream as a resource - which doesn't load the whole file in memory. Combined with fread/echo or stream_copy_to_stream you can prevent the whole file from being loaded in memory. With stream_copy_to_stream, you can simply copy the GridFSFile stream's resource to the STDOUT stream:
<?php
$m = new MongoClient;
$images = $m->my_db->getGridFS('images');
$image = $images->findOne('mongo.png');
header('Content-type: image/png;');
$stream = $image->getResource();
stream_copy_to_stream( $stream, STDOUT );
?>
Alternatively, you can use fseek() on the returned $stream resource to only send back parts of the stream to the client. Combined with HTTP Range requests, you can do this pretty efficiently.

If the other recipe fails, for example with NginX and php-fpm, because STDOUT is not available in fpm, you can use
fpassthru($stream);
instead of
stream_copy_to_stream( $stream, STDOUT );
So a complete solution looks like:
function img($nr)
{
$mongo = new MongoClient();
$img = $mongo->ai->getGridFS('img')->findOne(array('metadata.nr'=>$nr));
if (!$img)
err("not found");
header('X-Accel-Buffering: no');
header("Content-type: ".$img->file["contentType"]);
header("Content-length: ".$img->getSize());
fpassthru($img->getResource());
exit(0);
}
FYI:
In this example:
File is not acccessed by the filename, instead it is accessed by a number stored in the metadata. Hint: You can set an unique index to ensure, that no number can be used twice.
Content-Type is read from GridFS, too, so you do not need to hardcode this.
NginX caching is switched off to enable streaming.
This way you can even handle other things like video or html pages. If you want to enable NginX caching, perhaps only output X-Accel-Buffering on bigger sizes.

Related

uploading large object to Cloudfiles returns different md5

So I have this code and I'm trying to upload large files as per https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/Storage/Object.md to Rackspace:
$src_path = 'pathtofile.zip'; //about 700MB
$md5_checksum = md5_file($src_path); //result is f210775ccff9b0e4f686ea49ac4932c2
$trans_opts = array(
'name' => $md5_checksum,
'concurrency' => 6,
'partSize' => 25000000
);
$trans_opts['path'] = $src_path;
$transfer = $container->setupObjectTransfer($trans_opts);
$response = $transfer->upload();
Which allegedly uploads the file just fine
However when I try to download the file as recommended here https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/USERGUIDE.md:
$name = 'f210775ccff9b0e4f686ea49ac4932c2';
$object = $container->getObject($name);
$objectContent = $object->getContent();
$pathtofile = 'destinationpathforfile.zip';
$objectContent->rewind();
$stream = $objectContent->getStream();
file_put_contents($pathtofile, $stream);
$md5 = md5_file($pathtofile);
The result of md5_file ends up being different from 'f210775ccff9b0e4f686ea49ac4932c2'....moreover the downloaded zip ends up being unopenable/corrupted
What did I do wrong?
It's recommended that you only use multipart uploads for files over 5GB. For files under this threshold, you can use the normal uploadObject method.
When you use the transfer builder, it segments your large file into smaller segments (you provide the part size) and concurrently uploads each one. When this process has finished, a manifest file is created which contains a list of all these segments. When you download the manifest file, it collates them all together, effectively pretending to be the big file itself. But it's just really an organizer.
To get back to answering your question, the ETag header of a manifest file is not calculated how you may think. What you're currently doing is taking the MD5 checksum of the entire 700MB file, and comparing it against the MD5 checksum of the manifest file. But these aren't comparable. To quote the documentation:
the ETag header is calculated by taking the ETag value of each segment, concatenating them together, and then returning the MD5 checksum of the result.
There are also downsides to using this DLO operation that you need to be aware of:
End-to-end integrity is not assured. The eventual consistency model means that although you have uploaded a segment object, it might not appear in the container list immediately. If you download the manifest before the object appears in the container, the object will not be part of the content returned in response to a GET request.
If you think there's been an error in transmission, perhaps it's because a HTTP request failed along the way. You can use retry strategies (using the backoff plugin) to retry failed requests.
You can also turn on HTTP logging to check every network transaction to help with debugging. Be careful, though, using the above with echo out the HTTP request body (>25MB) into STDOUT. You might want to use this instead:
use Guzzle\Plugin\Log\LogPlugin;
use Guzzle\Log\ClosureLogAdapter;
$stream = fopen('php://output', 'w');
$logSubscriber = new LogPlugin(new ClosureLogAdapter(function($m) use ($stream) {
fwrite($stream, $m . PHP_EOL);
}), "# Request:\n{url} {method}\n\n# Response:\n{code} {phrase}\n\n# Connect time: {connect_time}\n\n# Total time: {total_time}", false);
$client->addSubscriber($logSubscriber);
As you can see, you're using a template to dictate what's outputted. There's a full list of template variables here.

Decompressing an LZO stream in PHP

I have a number of LZO-compressed log files on Amazon S3, which I want to read from PHP. The AWS SDK provides a nice StreamWrapper for reading these files efficiently, but since the files are compressed, I need to decompress the content before I can process it.
I have installed the PHP-LZO extension which allows me to do lzo_decompress($data), but since I'm dealing with a stream rather than the full file contents, I assume I'll need to consume the string one LZO compressed block at a time. In other words, I want to do something like:
$s3 = S3Client::factory( $myAwsCredentials );
$s3->registerStreamWrapper();
$stream = fopen("s3://my_bucket/my_logfile", 'r');
$compressed_data = '';
while (!feof($stream)) {
$compressed_data .= fread($stream, 1024);
// TODO: determine if we have a full LZO block yet
if (contains_full_lzo_block($compressed_data)) {
// TODO: extract the LZO block
$lzo_block = get_lzo_block($compressed_data);
$input = lzo_decompress( $lzo_block );
// ...... and do stuff to the decompressed input
}
}
fclose($stream);
The two TODOs are where I'm unsure what to do:
Inspecting the data stream to dtermine whether I have a full LZO block yet
Extracting this block for decompression
Since the compression was done by Amazon (s3distCp) I don't have control over the block size, so I'll probably need to inspect the incoming stream to determine how big the blocks are -- is this a correct assumption?
(ideally, I'd use a custom StreamFilter directly on the stream, but I haven't been able to find anyone who has done that before)
Ok executing a command via PHP can be done in many different ways, something like:
$command = 'gunzip -c /path/src /path/dest';
$escapedCommand = escapeshellcmd($command);
system($escapedCommand);
or also
shell_exec('gunzip -c /path/src /path/dest');
will do the work.
Now it's a matter of what command to execute, under Linux there's a nice command line tool called lzop which extracts orcompresses lzop files.
You can use it via something like:
lzop -dN sources.lzo
So you final code might be something as easy as:
shell_exec('lzop -dN s3://my_bucket/my_logfile');

Download and rename a file via url with PHP

I have this URL
www1.intranet.com/reportingtool.asp?settings=var&export = ok
There I can download a report. The file-name of the report includes a timestamp. e.g. 123981098298.xls and varies everytime I download it.
I want to have a script with this functions:
<?php
//Download the File
//rename it to **report.xls**
//save it to a specified place
?>
I don't have any idea after searching stackoverflow and googling on this topic :(
Is this generally possible?
The simplest scenario
You can download the report with file_get_contents:
$report = file_get_contents('http://www1.intranet.com/reportingtool.asp?...');
And save it locally (on the machine where PHP runs) with file_put_contents:
file_put_contents('/some/path/report.xls', $report);
More options
If downloading requires control over the HTTP request (e.g. because you need to use cookies or HTTP authentication) then it has to be done through cURL which enables full customization of the request.
If the report is large in size then it could be directly streamed to the destination instead of doing read/store/write in three steps (for example, using fopen/fread/fwrite).
This may not work depending on your security settings, but it's a simple example:
<?php
$file = file_get_contents('http://www1.intranet.com/reportingtool.asp?settings=var&export=ok');
file_put_contents('/path/to/your/location/report.xls', $file);
See file_get_contents and file_put_contents.

Memory limit avoiding In MongoGridFSFile::getBytes PHP

Suppose my server has 4GB of ram and i uploaded a file having size 5GB. How can i download that file using gridfs. Following site states that http://www.php.net/manual/en/mongogridfsfile.getbytes.php
If your file is bigger than memory than its a problem but doesn't tells a solution for that.
Can anyone have any solution for this.
i use this demo code to access a file.
<?php
// Connect to Mongo and set DB and Collection
$mongo = new Mongo();
$db = $mongo->myfiles;
// GridFS
$gridFS = $db->getGridFS();
// Find image to stream
$file = $gridFS->findOne("win.tar");
// Stream image to browser
header("Content-Description: File Transfer");
header("Content-Type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"win.tar\"");
echo $file->getBytes();
?>
As of version 1.3.0 of the PHP Driver you can access the GridFS Files as a PHP stream, using $MongoGridFSFile->getResource().
Using that method you can iteratively read the data and print it out, avoiding the memory limitation on your server.
just split a source file by chunk and save meta information about each of these chunks in mongodb, each of your chunk will be ordinary file in gridfs
after that, you have a meta layer with meta data about the source file
also you must solved problems reverse downloaded file from gridfs and compound source file from chunks
size of this chunk you may select based on your network speed and width limitation, this chunks and chunk in gridfs is different

Streaming a file fromFTP and letting user to download it at the same time

I'm creating a backup system, where backups will be generated automaticly, so I will be storing backups on a different server, however when I want to download them, I want the link to be a one time link, this isn't hard to make, however to make this secure I was thinking about storing the files so their not accesible via http on the other server.
So what I would do is connet via ftp, download the file to the main server, then present it for download and deleteit, however this will take a long time if the backup is large, is there a way to stream it from FTP without showing the person who is downloadiong the actual location and not store it on the server?
Here is a very basic example using cURL. It specifies a read callback which will be called when data is available to be read from FTP, and outputs the data to the browser to serve a simultaneous download to the client while the FTP transaction is taking place with the backup server.
This is a very basic exmaple which you can expand on.
<?php
// ftp URL to file
$url = 'ftp://ftp.mozilla.org/pub/firefox/nightly/latest-firefox-3.6.x/firefox-3.6.29pre.en-US.linux-i686.tar.bz2';
// init curl session with FTP address
$ch = curl_init($url);
// specify a callback function for reading data
curl_setopt($ch, CURLOPT_READFUNCTION, 'readCallback');
// send download headers for client
header('Content-type: application/octet-stream');
header('Content-Disposition: attachment; filename="backup.tar.bz2"');
// execute request, our read callback will be called when data is available
curl_exec($ch);
// read callback function, takes 3 params, the curl handle, the stream to read from and the maximum number of bytes to read
function readCallback($curl, $stream, $maxRead)
{
// read the data from the ftp stream
$read = fgets($stream, $maxRead);
// echo the contents just read to the client which contributes to their total download
echo $read;
// return the read data so the function continues to operate
return $read;
}
See curl_setopt() for more info on the CURLOPT_READFUNCTION option.

Categories