Safe image download from PHP

Safe image download from PHP - php

I want to allow my users to upload a file by providing a URL to the image.
Pretty much like imgur, you enter http://something.com/image.png and the script downloads the file, then keeps it on the server and publishes it.
I tried using file_get_contents() and getimagesize(). But I'm thinking there would be problems:
how can I protect the script from 100 users supplying 100 URLs to large images?
how can I determine if the download process will take or already takes too long?

This is actually interesting.
It appears that you can actually track and control the progress of a cURL transfer. See documentation on CURLOPT_NOPROGRESS, CURLOPT_PROGRESSFUNCTION and CURLOPT_WRITEFUNCTION
I found this example and changed it to:
<?php
file_put_contents('progress.txt', '');
$target_file_name = 'targetfile.zip';
$target_file = fopen($target_file_name, 'w');
$ch = curl_init('http://localhost/so/testfile2.zip');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_NOPROGRESS, FALSE);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progress_callback');
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'write_callback');
curl_exec($ch);
if ($target_file) {
fclose($target_file);
}
$_download_size = 0;
function progress_callback($download_size, $downloaded_size, $upload_size, $uploaded_size) {
global $_download_size;
$_download_size = $download_size;
static $previous_progress = 0;
if ($download_size == 0) {
$progress = 0;
}
else {
$progress = round($downloaded_size * 100 / $download_size);
}
if ($progress > $previous_progress) {
$previous_progress = $progress;
$fp = fopen('progress.txt', 'a');
fputs($fp, $progress .'% ('. $downloaded_size .'/'. $download_size .")\n");
fclose($fp);
}
}
function write_callback($ch, $data) {
global $target_file_name;
global $target_file;
global $_download_size;
if ($_download_size > 1000000) {
return '';
}
return fwrite($target_file, $data);
}
write_callback checks whether the size of the data is greater than a specified limit. If it is, it returns an empty string that aborts the transfer. I tested this on 2 files with 80K and 33M, respectively, with a 1M limit. In your case, progress_callback is pointless beyond the second line, but I kept everything in there for debugging purposes.
One other way to get the size of the data is to do a HEAD request but I don't think that servers are required to send a Content-length header.

To answer question one, you simply need to add the appropriate limits in your code. Define how many requests you want to accept in a given amount of time, track your requests in a database, and go from there. Also put a cap on file size.
For question two, you can set appropriate timeouts if you use cURL.

Related

Upload large files to Dropbox via HTTP API

I am currently implementing an upload mechanism for files on my webserver into my Dropbox app directory.
As stated on the API docs, there is the /upload endpoint (https://www.dropbox.com/developers/documentation/http/documentation#files-upload) which accepts files up to 150MB in size. However I‘m dealing with images and videos with a potential size of up to 2GB.
Therefore I need to use the upload_session endpoints. There is an endpoint to start the session (https://www.dropbox.com/developers/documentation/http/documentation#files-upload_session-start), to append data and to finish the session.
What currently is unclear to me is how to exactly use these endpoints. Do I have to split my file on my server into 150MB chunks (how would I do that with a video file?) and then upload the first chunk with /start, the next chunks with /append and the last one with /finish? Or can I just specify the file and the API somehow (??) does the splitting for me? Obviously not, but I somehow can‘t get my head around on how I should calculate, split and store the chunks on my webserver and not lose the session inbetween...
Any advice or further leading links are greatly appreciated. Thank you!

As Greg mentioned in the comments, you decide how to manage the "chunks" of the files. In addition to his .NET example, Dropbox has a good upload session implementation in the JavaScript upload example of the Dropbox API v2 JavaScript SDK.
At a high-level, you're splitting up the file into smaller sizes (aka "chunks") and passing those to the upload_session mechanism in a specific order. The upload mechanism has a few parts that need to be used in the following order:
Call /files/upload_session/start. Use the resulting session_id as a parameter in the following methods so Dropbox knows which session you're interacting with.
Incrementally pass each "chunk" of the file to /files/upload_session/append_v2. A couple things to be aware of:
The first call will return a cursor, which is used to iterate over the file's chunks in a specific order. It gets passed as a parameter in each consecutive call to this method (with the cursor being updated on every response).
The final call must include the property "close": true, which closes the session so it can be uploaded.
Pass the final cursor (and commit info) to /files/upload_session/finish. If you see the new file metadata in the response, then you did it!!
If you're uploading many files instead of large ones, then the /files/upload_session/finish_batch and /files/upload_session/finish_batch/check are the way to go.

I know this is an old post, but here is a fully functional solution for your problem. Maybe anyone else finds it usefull. :)
<?php
$backup_folder = glob('/var/www/test_folder/*.{sql,gz,rar,zip}', GLOB_BRACE); // Accepted file types (sql,gz,rar,zip)
$token = '<ACCESS TOKEN>'; // Dropbox Access Token;
$append_url = 'https://content.dropboxapi.com/2/files/upload_session/append_v2';
$start_url = 'https://content.dropboxapi.com/2/files/upload_session/start';
$finish_url = 'https://content.dropboxapi.com/2/files/upload_session/finish';
if (!empty($backup_folder)) {
foreach ($backup_folder as $single_folder_file) {
$file_name= basename($single_folder_file); // File name
$destination_folder = 'destination_folder'; // Dropbox destination folder
$info_array = array();
$info_array["close"] = false;
$headers = array(
'Authorization: Bearer ' . $token,
'Content-Type: application/octet-stream',
'Dropbox-API-Arg: '.json_encode($info_array)
);
$chunk_size = 50000000; // 50mb
$fp = fopen($single_folder_file, 'rb');
$fileSize = filesize($single_folder_file); // File size
$tosend = $fileSize;
$first = $tosend > $chunk_size ? $chunk_size : $tosend;
$ch = curl_init($start_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, fread($fp, $first));
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$response = curl_exec($ch);
$tosend -= $first;
$resp = explode('"',$response);
$sesion = $resp[3];
$position = $first;
$info_array["cursor"] = array();
$info_array["cursor"]["session_id"] = $sesion;
while ($tosend > $chunk_size)
{
$info_array["cursor"]["offset"] = $position;
$headers[2] = 'Dropbox-API-Arg: '.json_encode($info_array);
curl_setopt($ch, CURLOPT_URL, $append_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, fread($fp, $chunk_size));
curl_exec($ch);
$tosend -= $chunk_size;
$position += $chunk_size;
}
unset($info_array["close"]);
$info_array["cursor"]["offset"] = $position;
$info_array["commit"] = array();
$info_array["commit"]["path"] = '/'. $destination_folder . '/' . $file_name;
$info_array["commit"]["mode"] = array();
$info_array["commit"]["mode"][".tag"] = "overwrite";
$info_array["commit"]["autorename"] = true;
$info_array["commit"]["mute"] = false;
$info_array["commit"]["strict_conflict"] = false;
$headers[2] = 'Dropbox-API-Arg: '. json_encode($info_array);
curl_setopt($ch, CURLOPT_URL, $finish_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, $tosend > 0 ? fread($fp, $tosend) : null);
curl_exec($ch);
curl_close($ch);
fclose($fp);
unlink($single_folder_file); // Remove files from server folder
}
}

Using fread to send chunks of large data

This is a bit of a continue from my previous thread (PHP CURL Chunked encoding a large file (700mb)) but I've now improvised something else.
Right now, I'm trying to use fread and then sending files through CURL chunk by chunk (each chunk around 1MB) and while the idea is good and it does work. It does timeout the server, so I was wondering if there was any way to reduce the amount of times it sends a chunk per second or a way to make it so it doesn't completely overload my PHP process.
$length = (1024 * 1024) * 1;
$handle = fopen($getFile, "r");
while (($buffer = fread($handle, $length)) !== false) {
if ($response = sendChunk($getServer, $buffer)) {
$chunk++;
print "Chunk " . $chunk . " Sent (Code: " . $response . ")! \n";
}
}
The function sendChunk is
function sendChunk($url, $chunk) {
$POST_DATA = [
'file' => base64_encode($chunk)
];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_TIMEOUT, 2048);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $POST_DATA);
curl_exec($curl);
$response = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close ($curl);
return $response;
}
I tried making it so you can read the file line by line, but it doesn't work since a video file (mp4, wmv) is lots of random characters and what not.
UPDATE: I have discovered the issue and the timing out was actually a result of CloudFlare timing out when there's no such HTTP Response. So I decided to run the script using SSH and it worked fine .... except for one thing.
After the file does get successfully sent over it will just keep sending chunks of 0 bytes in this endless loop and I was told it was because feof() isn't always accurate in measuring that. So I tried using the ($buffer = fread($handle, $length) !== false) trick and it still repeats the same thing. Any ideas?

After working on this for around 8 hours, I noticed that I wasn't using $buffer to send the chunk so now I have done that.
while (!feof($fp) && ($buffer = fread($handle, $length)) !== false) {
if ($response = sendChunk($getServer, $buffer)) {
$chunk++;
print "Chunk " . $chunk . " Sent (Code: " . $response . ")! \n";
}
}
Everything works fine, I did some other touchups like check for a response code of 200. But the core of it works.
A lesson for anyone that is using Cloudflare and is wanting to transfer a file (Up to 2GB) to another server and want to use it via CURL.
There is better ways than just using CURL for this in my opinion, but client has requested it done via this way, but it works.
Cloudflare only has a maximum upload limit of 250MB for Free users, you cannot do chunked uploading through CURL's supported stream function as Cloudflare still reads it as > 250MB in the header.
When I managed to get this code to work, it would timeout on certain chunks and it was because Cloudflare needs an HTTP header within 100 seconds or it times out. Thankfully my script will be executed via CRON so it doesn't need to go through Cloudflare to work. However, if you are looking to execute code within the browser then you may want to take a look at this. https://github.com/marcialpaulg/Fixing-Cloudflare-Error-524

PHP: Get metadata of a remote .mp3 file

I am looking for a function that gets the metadata of a .mp3 file from a URL (NOT local .mp3 file on my server).
Also, I don't want to install http://php.net/manual/en/id3.installation.php or anything similar to my server.
I am looking for a standalone function.
Right now i am using this function:
<?php
function getfileinfo($remoteFile)
{
$url=$remoteFile;
$uuid=uniqid("designaeon_", true);
$file="../temp/".$uuid.".mp3";
$size=0;
$ch = curl_init($remoteFile);
//==============================Get Size==========================//
$contentLength = 'unknown';
$ch1 = curl_init($remoteFile);
curl_setopt($ch1, CURLOPT_NOBODY, true);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch1);
curl_close($ch1);
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
$size=$contentLength;
}
//==============================Get Size==========================//
if (!$fp = fopen($file, "wb")) {
echo 'Error opening temp file for binary writing';
return false;
} else if (!$urlp = fopen($url, "r")) {
echo 'Error opening URL for reading';
return false;
}
try {
$to_get = 65536; // 64 KB
$chunk_size = 4096; // Haven't bothered to tune this, maybe other values would work better??
$got = 0; $data = null;
// Grab the first 64 KB of the file
while(!feof($urlp) && $got < $to_get) { $data = $data . fgets($urlp, $chunk_size); $got += $chunk_size; } fwrite($fp, $data); // Grab the last 64 KB of the file, if we know how big it is. if ($size > 0) {
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RESUME_FROM, $size - $to_get);
curl_exec($ch);
// Now $fp should be the first and last 64KB of the file!!
#fclose($fp);
#fclose($urlp);
} catch (Exception $e) {
#fclose($fp);
#fclose($urlp);
echo 'Error transfering file using fopen and cURL !!';
return false;
}
$getID3 = new getID3;
$filename=$file;
$ThisFileInfo = $getID3->analyze($filename);
getid3_lib::CopyTagsToComments($ThisFileInfo);
unlink($file);
return $ThisFileInfo;
}
?>
This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded.
Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
To make my question clear : I need a fast standalone function that reads metadata of a remote URL .mp3 file.

This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded. Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
Yeah, well what do you propose? How do you expect to get data if you don't get data? There is no way to have a generic remote HTTP server send you that ID3 data. Really, there is no magic. Think about it.
What you're doing now is already pretty solid, except that it doesn't handle all versions of ID3 and won't work for files with more than 64KB of ID3 tags. What I would do to improve it to is to use multi-cURL.
There are several PHP classes available that make this easier:
https://github.com/jmathai/php-multi-curl
$mc = EpiCurl::getInstance();
$results[] = $mc->addUrl(/* Your stream URL here /*); // Run this in a loop, 10 at a time or so
foreach ($results as $result) {
// Do something with the data.
}

Exit out of a cURL fetch

I'm trying to find a way to only quickly access a file and then disconnect immediately.
So I've decided to use cURL since it's the fastest option for me. But I can't figure out how I should "disconnect" cURL.
With the code below, Apache's access logs says that the file I tried accessing was indeed accessed, but I'm feeling a little iffy about this, because when I just run the while loop without breaking out of it, it just keeps looping. Shouldn't the loop stop when cURL has finished fetching the file? Or am I just being silly; is the loop just restarting constantly?
<?php
$Resource = curl_init();
curl_setopt($Resource, CURLOPT_URL, '...');
curl_setopt($Resource, CURLOPT_HEADER, 0);
curl_setopt($Resource, CURLOPT_USERAGENT, '...');
while(curl_exec($Resource)){
break;
}
curl_close($Resource);
?>
I tried setting the CURLOPT_CONNECTTIMEOUT_MS / CURLOPT_CONNECTTIMEOUT options to very small values, but it didn't help in this case.
Is there a more "proper" way of doing this?

This statement is superflous:
while(curl_exec($Resource)){
break;
}
Instead just keep the return value for future reference:
$result = curl_exec($Resource);
The while loop does not help anything. So now to your question: You can tell curl that it should only take some bytes from the body and then quit. That can be achieved by reducing the CURLOPT_BUFFERSIZE to a small value and by using a callback function to tell curl it should stop:
$withCallback = array(
CURLOPT_BUFFERSIZE => 20, # ~ value of bytes you'd like to get
CURLOPT_WRITEFUNCTION => function($handle, $data) {
echo "WRITE: (", strlen($data), ") $data\n";
return 0;
},
);
$handle = curl_init("http://stackoverflow.com/");
curl_setopt_array($handle, $withCallback);
curl_exec($handle);
curl_close($handle);
Output:
WRITE: (10) <!DOCTYPE
Another alternative is to make a HEAD request by using CURLOPT_NOBODY which will never fetch the body. But it's not a GET request.
The connect timeout settings are about how long it will take until the connect times out. The connect is the phase until the server accepts input from curl and curl starts to know about that the server does. It's not related to the phase when curl fetches data from the server, that's
CURLOPT_TIMEOUT The maximum number of seconds to allow cURL functions to execute.
You find a long list of available options in the PHP Manual: curl_setoptDocs.

Perhaps that might be helpful?
$GLOBALS["dataread"] = 0;
define("MAX_DATA", 3000); // how many bytes should be read?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch, CURLOPT_WRITEFUNCTION, "handlewrite");
curl_exec($ch);
curl_close($ch);
function handlewrite($ch, $data)
{
$GLOBALS["dataread"] += strlen($data);
echo "READ " . strlen($data) . " bytes\n";
if ($GLOBALS["dataread"] > MAX_DATA) {
return 0;
}
return strlen($data);
}

Exceeding twitter calls even though I've only made 1/hour max

I've just set up a website with a new host and uploaded bits and bobs only to find that my script to get a particular users tweets had broken. I was being told that i had exceeded my limit of 150 calls to the twitter api.
At the time this may have made sense as i was testing everything and reloading the page a lot. However I've made only a couple of attempts today but all i ever get is the same error. I even rewrote my code so that it cached all the tweets i asked for for an hour, meaning at most 1 call an hour but its still not changing. Heres the code im using to get, cache and retrieve tweets;
function getUrl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
function getContent($file, $url, $hours = 1)
{
//vars
$CurrentTime = time();
$ExpireTime = $hours * 60 * 60;
$FileTime = filemtime($file);
if(file_exists($file) && ($CurrentTime - $ExpireTime < $FileTime))
{
//echo 'returning from cached file';
return json_decode(file_get_contents($file), true);
}
else
{
$content = getUrl($url);
$fh = fopen($file, 'w');
fwrite($fh, $content);
fclose($fh);
//echo 'retrieved fresh from '.$url.':: '.$content;
return json_decode($content, true);
}
}
$NumTweets = 3;
$AccountName = "TWITTERUSERNAME"; //this can be any username
$URL = "https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&screen_name=".$AccountName."&count=".$NumTweets;
$XML = getContent('./lib/inc/tweets.txt', $URL);
I then use the following to define the tweets and time they were created, then print them out in a similar manor;
for($i = 0; $i < $NumTweets; $i++)
{
$Tweet['text'][$i] = formatTwitString($XML[$i]['text']);
$Tweet['time'][$i] = formatTwitTime($XML[$i]['created_at']);
}
The format functions are irrelevant, i know they work fine.
The thing is, if i manually enter the $URL var that gets generated in my browser, i can load the tweets without a problem, its only when loading it through the website that i get the error, i tried copying the content using this method and saving it to a file on another webserver and replacing the twitter $URL with the other servers url and that worked fine,
so i dont think theres anything wrong with my code, but somehow my calls to twitter are getting rinsed, could this be other websites on the same host(iPage.com) using up these calls to twitter somehow?
I'm a bit lost, please help,
Thanks.

There's a chance your requests to Twitter are being made with the server's primary shared IP, causing your rate limiting to be shared with other users on the server. You can work around this by authenticating to Twitter when making the API requests.
https://dev.twitter.com/docs/rate-limiting#rest
By authenticating, you get your own rate limit regardless of the IP address used.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Safe image download from PHP - php

Related

Upload large files to Dropbox via HTTP API

Using fread to send chunks of large data

PHP: Get metadata of a remote .mp3 file

Exit out of a cURL fetch

Exceeding twitter calls even though I've only made 1/hour max

Categories

Resources