Uploading big files over HTTP

Uploading big files over HTTP - php

I need to upload potentially big (as in, 10's to 100's of megabytes) files from a desktop application to a server. The server code is written in PHP, the desktop application in C++/MFC. I want to be able to resume file uploads when the upload fails halfway through because this software will be used over unreliable connections. What are my options? I've found a number of HTTP upload components for C++, such as http://www.chilkatsoft.com/refdoc/vcCkUploadRef.html which looks excellent, but it doesn't seem to handle 'resume' of half done uploads (I assume this is because HTTP 1.1 doesn't support it). I've also looked at the BITS service but for uploads it requires an IIS server. So far my only option seems to be to cut up the file I want to upload into smaller pieces (say 1 meg each), upload them all to the server, reassemble them with PHP and run a checksum to see if everything went ok. To resume, I'd need to have some form of 'handshake' at the beginning of the upload to find out which pieces are already on the server. Will I have to code this by hand or does anyone know of a library that does all this for me, or maybe even a completely different solution? I'd rather not switch to another protocol that supports resume natively for maintenance reasons (potential problems with firewalls etc.)

I'm eight months late, but I just stumbled upon this question and was surprised that webDAV wasn't mentioned. You could use the HTTP PUT method to upload, and include a Content-Range header to handle resuming and such. A HEAD request would tell you if the file already exists and how big it is. So perhaps something like this:
1) HEAD the remote file
2) If it exists and size == local size, upload is already done
3) If size < local size, add a Content-Range header to request and seek to the appropriate location in local file.
4) Make PUT request to upload the file (or portion of the file, if resuming)
5) If connection fails during PUT request, start over with step 1
You can also list (PROPFIND) and rename (MOVE) files, and create directories (MKCOL) with dav.
I believe both Apache and Lighttpd have dav extensions.

You need a standard size (say 256k). If your file "abc.txt", uploaded by user x is 78.3MB it would be 313 full chunks and one smaller chunk.
You send a request to upload stating filename and size, as well as number of initial threads.
your php code will create a temp folder named after the IP address and filename,
Your app can then use MULTIPLE connections to send the data in different threads, so you could be sending chunks 1,111,212,313 at the same time (with separate checksums).
your php code saves them to different files and confirms reception after validating the checksum, giving the number of a new chunk to send, or to stop with this thread.
After all thread are finished, you would ask the php to join all the files, if something is missing, it would goto 3
You could increase or decrease the number of threads at will, since the app is controlling the sending.
You can easily show a progress indicator, either a simple progress bar, or something close to downthemall's detailed view of chunks.

libcurl (C api) could be a viable option
-C/--continue-at
Continue/Resume a previous file transfer at the given offset. The given offset is the exact number of bytes that will be skipped, counting from the beginning of the source file before it is transferred to the destination. If used with uploads, the FTP server command SIZE will not be used by curl.
Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.
If this option is used several times, the last one will be used

Google have created a Resumable HTTP Upload protocol. See https://developers.google.com/gdata/docs/resumable_upload

Is reversing the whole proccess an option? I mean, instead of pushing file over to the server make the server pull the file using standard HTTP GET with all bells and whistles (like accept-ranges, etc.).

Maybe the easiest method would be to create an upload page that would accept the filename and range in parameter, such as http://yourpage/.../upload.php?file=myfile&from=123456 and handle resumes in the client (maybe you could add a function to inspect which ranges the server has received)

# Anton Gogolev
Lol, I was just thinking about the same thing - reversing whole thing, making server a client, and client a server. Thx to Roel, why it wouldn't work, is clearer to me now.
# Roel
I would suggest implementing Java uploader [JumpLoader is good, with its JScript interface and even sample PHP server side code]. Flash uploaders suffer badly when it comes to BIIIGGG files :) , in a gigabyte scale that is.

F*EX can upload files up to TB range via HTTP and is able to resume after link failures.
It does not exactly meets your needs, because it is written in Perl and needs an UNIX based server, but the clients can be on any operating system. Maybe it is helpful for you nevertheless:
http://fex.rus.uni-stuttgart.de/

Exists the protocol called TUS for resumable uploads with some implementations in PHP and C++

Related

Process Uploaded file on web server without storing locally first?

I am trying to process the user uploaded file real time on the websever,
but it seems, APACHE invokes PHP, only once complete file is uploaded.
When i uploaded the file using CURL, and set
Transfer-Encoding : "Chunked"
I had some success, but can't do same thing via browser.
I used Dropzone.js but when i tried to set same header, it said Transfer -Encoding is an unsafe header, hence not setting it.
This answer explains what is the issue there.
Can't set Transfer-Encoding :"Chunked from Browser"
In a Nutshell problem is , when a user uploads the file to webserver, i want webserver to start processing it as soon as first byte is available.
by process i mean, PIPING it to a Named Pipe.
Dont want 500mb first getting uploaded to a server, then start processing it.
But with current Webserver (APACHE - PHP), I cant seem to be able to accomplish it.
could someone please explain, what technology stack or workarounds to use, so that i can upload the large file via browser and start processing it, as soon as first byte is available.

It is possible to use NodeJS/Multiparty to do that. Here they have an example of a direct upload to Amazon S3. This is the form, which sets content type to multipart/form-data. And here is the function for form parts processing. part parameter is of type ReadableStream, which will allow per-chunk processing of the input using data event.
More on readable streams in node js is here.

If you really want that (sorry don`t think thats a good idea) you should try looking for a FUSE Filesystem which does your job.
Maybe there is already one https://github.com/libfuse/libfuse/wiki/Filesystems
Or you should write your own.
But remember as soon as the upload is completed and the post script finishes his job the temp file will be deleted

you can upload file with html5 resumable upload tools (like Resumable.js) and process uploaded parts as soon as they received.
or as a workaround , you may find the path of uploaded file (usually in /tmp) and then write a background job to stream it to 3rd app. it may be harder.
there may be other solutions...

Detecting whether a file is complete or partial in PHP

In this system, SFTP will be used to upload large video files to a specific directory on the server. I am writing a PHP script that will be used to copy completely uploaded files to another directory.
My question is, what is a good way to tell if a file is completely uploaded or not? I considered taking note of the file size, sleeping the script for a few seconds, reading the size again, and comparing the two numbers. But this seems very kludgy.

Simple
If you just want a simple technique that will handle most use cases - write a process to check the last modified date or (as indicated in the question) the file size and if it hasn't changed in a set time, say 1 minute, assume the upload is finished and process it. Infact - your question is a near duplicate of another question showing exactly this.
Robust
An alternative to polling the size/last modified time of a file (which with a slow or intermittent connection could cause an open upload to be considered complete too early) is to listen for when the file has finished being written with inotify.
Listening for IN_CLOSE_WRITE you can know when an ftp upload has finished updating a specific file.
I've never used it in php, but from the spartan docs it looks that that aught to work in exactly the same way as the underlying lib.
Make FTP clients work for you
Take into account how some ftp programs work, from this comment:
it's worth mentioning that some ftp servers create a hidden temporary file to receive data until the transfer is finished. I.e. .foo-ftpupload-2837948723 in the same upload path as the target file
If applicable, from the finished filename you can therefore tell if the ftp upload has actually finished or was stopped/disconnected mid-upload.

In theory, in case you are allowed to run system commands from PHP you can try to get the pids of your sftp server processes with pidof command and then use lsof output to check if open file descriptors exist for the currently checked upload file.

During a transfer, the only thing that knows the real size of the file is your SFTP client and maybe the SFTP server (don't know protocol specifics so the server may know too.) Unless you can talk to the SFTP server directly, then you'll have to wait for the file to finish. Checking at some interval seems kludgy but under the circumstances is pretty good. It's what we do on my project.
If you have a very short poll interval, it's not terribly resilient when dealing with uploads that may stall for a few seconds. So just set the interval to one a minute or something like that. Assuming you're on a *nix based platform, you could set up a cron job to run every minute (or whatever your polling interval is) to run your script for you.

As #jcolebrand stated - if you have no control over upload process - there is actually not much you can do except guessing (size/date of file). I would look for a server software that allows you to execute hooks/scripts before/after some server action (here: file transfer complete). I am not sure though whether such software exists. As a last resort you could adapt some opensource SFTP server for your requirements by adding such a hook for yourself.

An interesting answer was mentioned in a commend under original question (but for some unknown reason the comment was deleted): you can try parsing SFTP server logs to find files that were uploaded completely.

Serving large file downloads from remote server

We have files that are hosted on RapidShare which we would like to serve through our own website. Basically, when a user requests http://site.com/download.php?file=whatever.txt, the script should stream the file from RapidShare to the user.
The only thing I'm having trouble getting my head around is how to properly stream it. I'd like to use cURL, but I'm not sure if I can read the download from RapidShare in chunks and then echo them to the user. The best way I've thought of so far is to use a combination of fopen, fread, echo'ing the chunk of the file to the user, flushing, and repeating that process until the entire file is transferred.
I'm aware of the PHP readfile() function aswell, but would that be the best option? Bear in mind that these files can be several GB's in size, and although we have servers with 16GB RAM I want to keep the memory usage as low as possible.
Thank you for any advice.

HTTP has a Header called "Range" which basically allows you to fetch any chunk of a file (knowing that you already know the file size), but since PHP isn't multi-threaded aware, I don't see any benefit of using it.
Afaik, if you don't want to consume all your RAM, the only way to go is a two steps way.
First, stream the remote file using fopen()/fread() (or any php functions which allow you to use stream), split the read in small chunks (2048 bits may be enough), write/append the result to a tempfile(), then "echoing" back to your user by reading the temporary file.
That way, even a file 2To would, basically, consumes 2048 bits since only the chunk and the handle of the file is in memory.
You may also write some kind of proxy manager to cache and keep already downloaded files to avoid the remote reading process if a file is heavily downloaded (and keep it locally for a given time).

Is uploading very large files (eg 500mb) via php advisable?

I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks

Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.

Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.

PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.

Upload 1GB files using chunking in PHP

I have a web application that accepts file uploads of up to 4 MB. The server side script is PHP and web server is NGINX. Many users have requested to increase this limit drastically to allow upload of video etc.
However there seems to be no easy solution for this problem with PHP. First, on the client side I am looking for something that would allow me to chunk files during transfer. SWFUpload does not seem to do that. I guess I can stream uploads using Java FX (http://blogs.oracle.com/rakeshmenonp/entry/javafx_upload_file) but I can not find any equivalent of request.getInputStream in PHP.
Increasing browser client_post limits or php.ini upload or max_execution times is not really a solution for really large files (~ 1GB) because maybe the browser will time out and think of all those blobs stored in memory.
Is there any way to solve this problem using PHP on server side? I would appreciate your replies.

plupload is a javascript/php library, and it's quite easy to use and allows chunking.
It uses HTML5 though.

Take a look at tus protocol which is a HTTP based protocol for resumable file uploads so you can carry on where you left off without re-uploading whole data again in case of any interruptions. This protocol has also been adopted by vimeo from May, 2017.
You can find various implementations of the protocol in different languages here. In your case, you can use its javascript client called uppy and use golang or php based server implementation in a server.

"but I can not find any equivalent of request.getInputStream in PHP. "
fopen('php://input'); perhaps?

I have created a JavaFX client to send large files in chunks of max post size (I am using 2 MB) and a PHP receiver script to assemble the chunks into original file. I am releasing the code under apache license here : http://code.google.com/p/gigaupload/
Feel free to use/modify/distribute.

Try using the bigupload script. It is very easy to integrate and can upload up to 2 Gb in chunks. The chunk size is customizable.

How about using a java applet for the uploading and PHP for processing..
You can find an example here for Jupload:
http://sourceforge.net/apps/mediawiki/jupload/index.php?title=PHP_Example

you can use this package
it supports resumable chunk upload.
in the examples/js-examples/resumable-chunk-upload example , you can close and re-open the browser and then resume not completed uploads.

You can definitely write a web app that will accept a block of data (even via a POST) then append that block of data to a file. It seems to me that you need some kind of client side app that will take a file and break it up into chunks, then send it to your web service one chunk at a time. However, it seems a lot easier to create an sftp dir, and let clients just sftp up files using some pre-existing client app.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Uploading big files over HTTP - php

Google have created a Resumable HTTP Upload protocol. See https://developers.google.com/gdata/docs/resumable_upload

Is reversing the whole proccess an option? I mean, instead of pushing file over to the server make the server pull the file using standard HTTP GET with all bells and whistles (like accept-ranges, etc.).

Maybe the easiest method would be to create an upload page that would accept the filename and range in parameter, such as http://yourpage/.../upload.php?file=myfile&from=123456 and handle resumes in the client (maybe you could add a function to inspect which ranges the server has received)

Exists the protocol called TUS for resumable uploads with some implementations in PHP and C++

Related

Process Uploaded file on web server without storing locally first?

Detecting whether a file is complete or partial in PHP

Serving large file downloads from remote server

Is uploading very large files (eg 500mb) via php advisable?

Upload 1GB files using chunking in PHP

Categories

Resources