Upload large files to Dropbox via HTTP API - php

I am currently implementing an upload mechanism for files on my webserver into my Dropbox app directory.
As stated on the API docs, there is the /upload endpoint (https://www.dropbox.com/developers/documentation/http/documentation#files-upload) which accepts files up to 150MB in size. However I‘m dealing with images and videos with a potential size of up to 2GB.
Therefore I need to use the upload_session endpoints. There is an endpoint to start the session (https://www.dropbox.com/developers/documentation/http/documentation#files-upload_session-start), to append data and to finish the session.
What currently is unclear to me is how to exactly use these endpoints. Do I have to split my file on my server into 150MB chunks (how would I do that with a video file?) and then upload the first chunk with /start, the next chunks with /append and the last one with /finish? Or can I just specify the file and the API somehow (??) does the splitting for me? Obviously not, but I somehow can‘t get my head around on how I should calculate, split and store the chunks on my webserver and not lose the session inbetween...
Any advice or further leading links are greatly appreciated. Thank you!

As Greg mentioned in the comments, you decide how to manage the "chunks" of the files. In addition to his .NET example, Dropbox has a good upload session implementation in the JavaScript upload example of the Dropbox API v2 JavaScript SDK.
At a high-level, you're splitting up the file into smaller sizes (aka "chunks") and passing those to the upload_session mechanism in a specific order. The upload mechanism has a few parts that need to be used in the following order:
Call /files/upload_session/start. Use the resulting session_id as a parameter in the following methods so Dropbox knows which session you're interacting with.
Incrementally pass each "chunk" of the file to /files/upload_session/append_v2. A couple things to be aware of:
The first call will return a cursor, which is used to iterate over the file's chunks in a specific order. It gets passed as a parameter in each consecutive call to this method (with the cursor being updated on every response).
The final call must include the property "close": true, which closes the session so it can be uploaded.
Pass the final cursor (and commit info) to /files/upload_session/finish. If you see the new file metadata in the response, then you did it!!
If you're uploading many files instead of large ones, then the /files/upload_session/finish_batch and /files/upload_session/finish_batch/check are the way to go.

I know this is an old post, but here is a fully functional solution for your problem. Maybe anyone else finds it usefull. :)
<?php
$backup_folder = glob('/var/www/test_folder/*.{sql,gz,rar,zip}', GLOB_BRACE); // Accepted file types (sql,gz,rar,zip)
$token = '<ACCESS TOKEN>'; // Dropbox Access Token;
$append_url = 'https://content.dropboxapi.com/2/files/upload_session/append_v2';
$start_url = 'https://content.dropboxapi.com/2/files/upload_session/start';
$finish_url = 'https://content.dropboxapi.com/2/files/upload_session/finish';
if (!empty($backup_folder)) {
foreach ($backup_folder as $single_folder_file) {
$file_name= basename($single_folder_file); // File name
$destination_folder = 'destination_folder'; // Dropbox destination folder
$info_array = array();
$info_array["close"] = false;
$headers = array(
'Authorization: Bearer ' . $token,
'Content-Type: application/octet-stream',
'Dropbox-API-Arg: '.json_encode($info_array)
);
$chunk_size = 50000000; // 50mb
$fp = fopen($single_folder_file, 'rb');
$fileSize = filesize($single_folder_file); // File size
$tosend = $fileSize;
$first = $tosend > $chunk_size ? $chunk_size : $tosend;
$ch = curl_init($start_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, fread($fp, $first));
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$response = curl_exec($ch);
$tosend -= $first;
$resp = explode('"',$response);
$sesion = $resp[3];
$position = $first;
$info_array["cursor"] = array();
$info_array["cursor"]["session_id"] = $sesion;
while ($tosend > $chunk_size)
{
$info_array["cursor"]["offset"] = $position;
$headers[2] = 'Dropbox-API-Arg: '.json_encode($info_array);
curl_setopt($ch, CURLOPT_URL, $append_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, fread($fp, $chunk_size));
curl_exec($ch);
$tosend -= $chunk_size;
$position += $chunk_size;
}
unset($info_array["close"]);
$info_array["cursor"]["offset"] = $position;
$info_array["commit"] = array();
$info_array["commit"]["path"] = '/'. $destination_folder . '/' . $file_name;
$info_array["commit"]["mode"] = array();
$info_array["commit"]["mode"][".tag"] = "overwrite";
$info_array["commit"]["autorename"] = true;
$info_array["commit"]["mute"] = false;
$info_array["commit"]["strict_conflict"] = false;
$headers[2] = 'Dropbox-API-Arg: '. json_encode($info_array);
curl_setopt($ch, CURLOPT_URL, $finish_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, $tosend > 0 ? fread($fp, $tosend) : null);
curl_exec($ch);
curl_close($ch);
fclose($fp);
unlink($single_folder_file); // Remove files from server folder
}
}

Related

Preventing hotlinking on Amazon S3 with PHP

In my site i am doing a image protection section to cut the costs of Amazon S3. So as a part of that i have made anti hot-linking links for images using php (to the best of my understanding).
<video src="/media.php?id=711/video.mp4"></video>
Then my media.php file looks like:
if (isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER'] != 'example.com')
{
header('HTTP/1.1 503 Hot Linking Not Permitted');
header("Content-type: image/jpeg");
readfile("http://example.com/monkey.jpg");
exit;
}
$url = 'https://s3.example.com';
$s3 = "$url/$file";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $s3);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = explode("\n", trim(curl_exec($ch)));
foreach($results as $line) {
if (strtok($line, ':') == 'Content-Type') {
$parts = explode(":", $line);
$mime = trim($parts[1]);
}
}
header("Content-type: $mime");
readfile($s3);
To make it less obvious, I have set up a rewrite to route /711/video.mp4 into cdn/711/video.mp4. That way, it doesn't look like there is an PHP script.
RewriteRule ^cdn/([0-9a-zA-Z_]+)/([0-9a-zA-Z-\w.-]+)([\w.-]+)?$ media\.php\?id=$1/$2 [QSA,L]
This above system is working fine but the issue is when i load image directly the loading time of the image is 237ms and when the image is loaded through this PHP script the loading time is 1.65s
I have shared the entire code i have, so if there is any chance of improvement in it please guide me in the right direction so i can make changes accordingly.
The reason your script takes longer than querying s3 directly is that you've added a lot of overhead to the image request. Your webserver needs to download the image and then forward it to your users. That is almost definitely your biggest bottleneck.
The first thing I would suggest doing is using the S3 API. This still uses curl() under the hood but has optimizations that should nominally increase performance. This would also allow you to make your s3 bucket "private" which would make obscuring the s3 url unnecessary.
All of that said, the recommended way to prevent hotlinking with AWS is to use cloudfront with referrer checking. How that is done is outlined by AWS here.
If you don't want to refactor your infrastructure, the best way to improve performance is likely to implement a local cache. At its most basic, that would look something like this:
$cacheDir = '/path/to/a/local/directory/';
$cacheFileName = str_replace('/', '_', $file);
if (file_exists($cacheDir . $cacheFileName)){
$mime = mime_content_type($cacheDir . $cacheFileName);
$content = file_get_contents($cacheDir . $cacheFileName);
} else {
$url = 'https://s3.example.com';
$s3 = "$url/$file";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $s3);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = explode("\n", trim(curl_exec($ch)));
foreach($results as $line) {
if (strtok($line, ':') == 'Content-Type') {
$parts = explode(":", $line);
$mime = trim($parts[1]);
}
}
$content = file_get_contents($s3);
file_put_contents($cacheDir . $cacheFileName, $content);
}
header("Content-type: $mime");
echo $content;
This stores a copy of the file locally so that the server does not need to download it from s3 every time it is requested. That should reduce your overhead somewhat, though it will not do as well as a purely AWS based solution. With this solution you'll also have to add ways of cache-breaking, periodically expiring the cache, etc. Just to reiterate, you shouldn't just copy/paste this into a production environment, it is a start but is more a proof of concept than production ready code.

Best bulk image download practise using curl?

I have a script running for a Laravel 5.4 webapplication that is supposed to download a big amount of images (10k). I'm wondering what the best way to handle this would be. I currently grab the base64_encode() data from the remote image and write it to a local folder with the function file_put_contents(). This works fine but some images can take more than 10 seconds to download/write, image that times a 10 thousand. Fair enough these images are rather big but I would like to see this process happen faster and thus I am asking for advice!
My current process is like this;
I read a JSON file containing all the image links I have to
download.
I convert the JSON data to an array with json_decode() and I loop through all the links with a foreach() loop and let curl handle the rest.
All the relevant parts of the code look like this:
<?php
// Defining the paths for easy access.
$__filePath = public_path() . DIRECTORY_SEPARATOR . "importImages" . DIRECTORY_SEPARATOR . "images" . DIRECTORY_SEPARATOR . "downloadList.json";
$__imagePath = public_path() . DIRECTORY_SEPARATOR . "importImages" . DIRECTORY_SEPARATOR . "images";
// Decode the json array into an array readable by PHP.
$this->imagesToDownloadList = json_decode(file_get_contents($__filePath));
// Let's loop through the image list and try to download
// all of the images that are present within the array.
foreach ($this->imagesToDownloadList as $IAN => $imageData) {
$__imageGetContents = $this->curl_get_contents($imageData->url);
$__imageBase64 = ($__imageGetContents) ? base64_encode($__imageGetContents) : false;
if( !file_put_contents($__imagePath . DIRECTORY_SEPARATOR . $imageData->filename, base64_decode($__imageBase64)) ) {
return false;
}
return true;
}
And the curl_get_contents functions looks like this:
<?php
private function curl_get_contents($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
I hope someone could englighten me with possible improvements that I could apply on the current way I'm handling this mass-download.

What would be the best way to collect the titles (in bulk) of a subreddit

I am looking to collect the titles of all of the posts on a subreddit, and I wanted to know what would be the best way of going about this?
I've looked around and found some stuff talking about Python and bots. I've also had a brief look at the API and am unsure in which direction to go.
As I do not want to commit to find out 90% of the way through it won't work, I ask if someone could point me in the right direction of language and extras like any software needed for example pip for Python.
My own experience is in web languages such as PHP so I initially thought of a web app would do the trick but am unsure if this would be the best way and how to go about it.
So as my question stands
What would be the best way to collect the titles (in bulk) of a
subreddit?
Or if that is too subjective
How do I retrieve and store all the post titles of a subreddit?
Preferably needs to :
do more than 1 page of (25) results
save to a .txt file
Thanks in advance.
PHP; in 25 lines:
$subreddit = 'pokemon';
$max_pages = 10;
// Set variables with default data
$page = 0;
$after = '';
$titles = '';
do {
$url = 'http://www.reddit.com/r/' . $subreddit . '/new.json?limit=25&after=' . $after;
// Set URL you want to fetch
$ch = curl_init($url);
// Set curl option of of header to false (don't need them)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Set curl option of nobody to false as we need the body
curl_setopt($ch, CURLOPT_NOBODY, 0);
// Set curl timeout of 5 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// Set curl to return output as string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Execute curl
$output = curl_exec($ch);
// Get HTTP code of request
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Close curl
curl_close($ch);
// If http code is 200 (success)
if ($status == 200) {
// Decode JSON into PHP object
$json = json_decode($output);
// Set after for next curl iteration (reddit's pagination)
$after = $json->data->after;
// Loop though each post and output title
foreach ($json->data->children as $k => $v) {
$titles .= $v->data->title . "\n";
}
}
// Increment page number
$page++;
// Loop though whilst current page number is less than maximum pages
} while ($page < $max_pages);
// Save titles to text file
file_put_contents(dirname(__FILE__) . '/' . $subreddit . '.txt', $titles);

PHP: Get metadata of a remote .mp3 file

I am looking for a function that gets the metadata of a .mp3 file from a URL (NOT local .mp3 file on my server).
Also, I don't want to install http://php.net/manual/en/id3.installation.php or anything similar to my server.
I am looking for a standalone function.
Right now i am using this function:
<?php
function getfileinfo($remoteFile)
{
$url=$remoteFile;
$uuid=uniqid("designaeon_", true);
$file="../temp/".$uuid.".mp3";
$size=0;
$ch = curl_init($remoteFile);
//==============================Get Size==========================//
$contentLength = 'unknown';
$ch1 = curl_init($remoteFile);
curl_setopt($ch1, CURLOPT_NOBODY, true);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch1);
curl_close($ch1);
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
$size=$contentLength;
}
//==============================Get Size==========================//
if (!$fp = fopen($file, "wb")) {
echo 'Error opening temp file for binary writing';
return false;
} else if (!$urlp = fopen($url, "r")) {
echo 'Error opening URL for reading';
return false;
}
try {
$to_get = 65536; // 64 KB
$chunk_size = 4096; // Haven't bothered to tune this, maybe other values would work better??
$got = 0; $data = null;
// Grab the first 64 KB of the file
while(!feof($urlp) && $got < $to_get) { $data = $data . fgets($urlp, $chunk_size); $got += $chunk_size; } fwrite($fp, $data); // Grab the last 64 KB of the file, if we know how big it is. if ($size > 0) {
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RESUME_FROM, $size - $to_get);
curl_exec($ch);
// Now $fp should be the first and last 64KB of the file!!
#fclose($fp);
#fclose($urlp);
} catch (Exception $e) {
#fclose($fp);
#fclose($urlp);
echo 'Error transfering file using fopen and cURL !!';
return false;
}
$getID3 = new getID3;
$filename=$file;
$ThisFileInfo = $getID3->analyze($filename);
getid3_lib::CopyTagsToComments($ThisFileInfo);
unlink($file);
return $ThisFileInfo;
}
?>
This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded.
Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
To make my question clear : I need a fast standalone function that reads metadata of a remote URL .mp3 file.
This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded. Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
Yeah, well what do you propose? How do you expect to get data if you don't get data? There is no way to have a generic remote HTTP server send you that ID3 data. Really, there is no magic. Think about it.
What you're doing now is already pretty solid, except that it doesn't handle all versions of ID3 and won't work for files with more than 64KB of ID3 tags. What I would do to improve it to is to use multi-cURL.
There are several PHP classes available that make this easier:
https://github.com/jmathai/php-multi-curl
$mc = EpiCurl::getInstance();
$results[] = $mc->addUrl(/* Your stream URL here /*); // Run this in a loop, 10 at a time or so
foreach ($results as $result) {
// Do something with the data.
}

Safe image download from PHP

I want to allow my users to upload a file by providing a URL to the image.
Pretty much like imgur, you enter http://something.com/image.png and the script downloads the file, then keeps it on the server and publishes it.
I tried using file_get_contents() and getimagesize(). But I'm thinking there would be problems:
how can I protect the script from 100 users supplying 100 URLs to large images?
how can I determine if the download process will take or already takes too long?
This is actually interesting.
It appears that you can actually track and control the progress of a cURL transfer. See documentation on CURLOPT_NOPROGRESS, CURLOPT_PROGRESSFUNCTION and CURLOPT_WRITEFUNCTION
I found this example and changed it to:
<?php
file_put_contents('progress.txt', '');
$target_file_name = 'targetfile.zip';
$target_file = fopen($target_file_name, 'w');
$ch = curl_init('http://localhost/so/testfile2.zip');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_NOPROGRESS, FALSE);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progress_callback');
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'write_callback');
curl_exec($ch);
if ($target_file) {
fclose($target_file);
}
$_download_size = 0;
function progress_callback($download_size, $downloaded_size, $upload_size, $uploaded_size) {
global $_download_size;
$_download_size = $download_size;
static $previous_progress = 0;
if ($download_size == 0) {
$progress = 0;
}
else {
$progress = round($downloaded_size * 100 / $download_size);
}
if ($progress > $previous_progress) {
$previous_progress = $progress;
$fp = fopen('progress.txt', 'a');
fputs($fp, $progress .'% ('. $downloaded_size .'/'. $download_size .")\n");
fclose($fp);
}
}
function write_callback($ch, $data) {
global $target_file_name;
global $target_file;
global $_download_size;
if ($_download_size > 1000000) {
return '';
}
return fwrite($target_file, $data);
}
write_callback checks whether the size of the data is greater than a specified limit. If it is, it returns an empty string that aborts the transfer. I tested this on 2 files with 80K and 33M, respectively, with a 1M limit. In your case, progress_callback is pointless beyond the second line, but I kept everything in there for debugging purposes.
One other way to get the size of the data is to do a HEAD request but I don't think that servers are required to send a Content-length header.
To answer question one, you simply need to add the appropriate limits in your code. Define how many requests you want to accept in a given amount of time, track your requests in a database, and go from there. Also put a cap on file size.
For question two, you can set appropriate timeouts if you use cURL.

Categories