What is a good way to upload multiple large files in PHP?
Note: in my case I don't really need very large files, I need something like 40-50 or maybe 100 MB upload. My purpose and goal is mainly about documents upload in websites, these documents (pdf, doc, etc.) can sometimes be 20-30 MB and rarely 100+MB, where normally the php MAX_UPLOAD_FILESIZE is 10-20 MB. I think that having a large MAX_UPLOAD_FILESIZE is very bad and anyway allowing large file uploads in PHP (like 1GB+) is really a bad idea.
I have been reading, even here on SO, different solutions like plupload and some others (HTTP Upload, bigUpload, etc.) and I am not sure which way is better to consider.
As a principle, I'd like to find something with mantained code (not an abandoned library) and possibly following coding standards (PSR).
I think that writing all from scratch would be a huge work, but maybe I am wrong, if someone did that I would like to hear your experience. Of course if I can't find something that gives me what I need, I'll have to edit existing libraries or write one on my own.
I will give a try to plupload but my biggest concern is that it might not be well mantained. There is a v3 release but the stable is still the v2. If I understand correctly, the last updates of this library on GitHub were made in 2017 and this doesn't really reassure me. PHP but everything in general changes really quickly.
Problem 1: Server knows the size of the file only when the upload is finished
I think this is the first problem. I can't be sure about the file size until the upload is completed. If I have to give an error in PHP because the file is too big, I'd need to upload the file before. I could guess the size in javascript but I think that would be too hackable. Same thing for HTML5. All client-side checks could be "hacked" or manipulated even if it requires work, it is still doable
Problem 2: File chunking, is it the best solution?
I have read about file chunking, which is very interesting, but that is only client-side, right? Because to chunk a file on server-side, you go back to problem 1, you need to upload the whole file before chunking it. Is client-side chunking safe? Are there known vulnerabilities about it? (an example, could I exploit something out while the file is being uploaded?). I will read more about this later.
Also what about the time required to upload? Let's say you are using your phone in 4G to upload two PDFs or JPGs on a form, you have to upload 20-30 MB with a bad signal in that moment, let's say you take 30 seconds or 1 minute to upload them. Will the server stop listening after a while?
In any case, with file chunking, do I need to have a large MAX_FILESIZE_UPLOAD? Is usually the request sent (as a POST) with the entire file, which could return an error because the size exceeds the PHP limit? I would like to keep a normal PHP file size limit.
Problem 3: stopping file upload and deleting interrupted upload
What happens if I stop or I want to stop my upload? I guess that the server will find itself with a temporary file which is only partially uploaded.
Then, in general, is the best way to have a cron checking the /tmp (or whatever) folder and deleting uncompleted files?
Problem 4: overwriting interrupted uploads
What if the user uploads a new file that should overwrite the old one, but the old one was not completed? Let's say, user uploads a 10MB document but realizes it's the wrong one. He will probably reload the page or maybe click "Browse" again and upload the new one. So, all the temporary files should have a unique name. If I remember correctly, PHP already gives them a random name in /tmp/. Is this enough? Would it be better to manually give them a random name, maybe based on a timestamp?
In code words, something like:
$fileName = time() . '-' . uniqid() . '.tmp';
Problem 5: what about multiple files upload?
Let's say the user uploads 3 documents, 10 MB each one. Should the server receive them all at the same time or one by one? Or maybe is it the same? At a first look, I might think that multiple uploads at the same time could give more problems with server load. Maybe this is not that important though.
Problem 6: accessibility
Would all this be easily accessible? Do you think that a user using a screen-reader would be able to upload multiple (and possibly large) files without problems?
Conclusion
In conclusion I will take a look at existing libraries and test a bit if I can find a good solution for my needs. In general I would like to read comments or experiences that could help me understand possible difficulties and problems that I could encounter.
I will try to help you in your questions:
Problem 1: Server knows the size of the file only when the upload is finished
You need to configure upload_max_filesize and post_max_size to allow the maximum file upload you want to be able to receive in a single request.
And yes, you will know the size once the file has been uploaded, as your script will be executed once the file is completely uploaded.
You can have some kind of checking in your javascript to improve your UI to the customer.
Also if the file exceeds the maximum size, the file will not be available in your server script.
Problem 2: File chunking, is it the best solution?
I don't think this would be any better than uploading the whole file.
The speed will not be increased, so although you can paralelize the upload, the upload speed will be determined by the client speed, you will end up uploading 1 file at 10Mb/s or 10 files at 1Mb/s, at the end will be the same.
Also you will have a very complex code at the client and at the server to handle it, and you will have to handle new error scenarios.
It is not worth it.
Problem 3: stopping file upload and deleting interrupted upload
If the client stops the upload, your server code will not be executed as the request has not been completed.
Also you won't have any troubles with duplicated tmp files, the file is removed from tmp folder when you move it using move_uploaded_file and if you don't move the file will be deleted when your script has finished.
https://www.php.net/manual/en/features.file-upload.post-method.php
Problem 5: what about multiple files upload?
With multiple file uploads, i prefer to use diferent requests via javascript for each file, then you can paralelize or start uploading files one by one and show the user the upload process.
With javascript you can see the upload percent of each request, and it is very useful when uploading big files to give the user the feedback that the application is working properly.
Problem 6: accessibility
You will handle the same accessibility issues than uploading a small file.
Conclusion
The server side coding is independent of the file size you are trying to upload, as it just does some checks and moves the file to the proper location.
You will have to code some javascript to improve the ui to show the progress to the customer, and if you want to add the ability to cancel the upload, it is very easy as it is just an http request with a listener that updates the progress (https://stackoverflow.com/a/47638378/1445024)
Related
I'm trying to convert a website to use S3 storage instead of local (expensive) disk storage. I solved the download problem using a stream wrapper interface on the S3Client. The upload problem is harder.
It seems to me that when I post to a PHP endpoint, the $_FILES object is already populated and copied to /tmp/ before I can even intercept it!
On top of that, the S3Client->upload() expects a file on the disk already!
Seems like a double-whammy against what I'm trying to do, and most advice I've found uses NodeJS or Java streaming so I don't know how to translate.
It would be better if I could intercept the code that populates $_FILES and then send up 5MB chunks from memory with the S3\ObjectUploader, but how do you crack open the PHP multipart handler?
Thoughts?
EDIT: It is a very low quantity of files, 0-20 per day, mostly 1-5MB sometimes hitting 40~70MB. Periodically (once every few weeks) a 1-2GB file will be uploaded. Hence the desire to move off an EC2 instance and into heroku/beanstalk type PaaS where I won't have much /tmp/ space.
It's hard to comment on your specific situation without knowing the performance requirements of the application and the volume of users needed to access it so I'll try to answer assuming a basic web app uploading profile avatars.
There are some good reasons for this, the file is streamed to the disk for multiple purposes one of which is to conserve memory use. If your file is not on the disk than it is in memory(think disk usage is expensive? bump up your memory usage and see how expensive that gets), which is fine for a single user uploading a small file, but not so great for a bunch of users uploading small files or worse: large files. You'll likely see the best performance if you use the defaults on these libraries and let them stream to and from the disk.
But again I don't know your use case and you may actually need to avoid the disk at all costs for some unknown reason.
My application requires downloading many images from server(each image about 10kb large). And I'm simply downloading each of them with independent AsyncTask without any optimization.
Now I'm wondering what's the common practice to transfer these images. For example, I'm thinking about saving zipped images at server, then send zipped file for user's mobile to unzip. In this case, is it better to combine the zip files into one big zip file for user to download?
Or there's better solution? Thanks in advance!
EDIT:
It seems combining zip files is a good idea, but I feel it may take too long for user to wait downloading and unzipping all images. So I may put ten or twenty images in each zip file, so user can see some downloaded ones while waiting for more to come. Having multiple AsyncTask fired together can be faster right? But they won't finish at the same time even given same file size and same address to download?
Since latency is often the largest problem with mobile connections, reducing the number of connections you have to open is a great way to optimize the loading times. Sending a zip file with all the images sounds like a very good idea, and is probably worth the time implementing.
Images probably are already compressed (gif, jpg, png). You will not reduce filesize but will reduce the number of connections. Which is a good idea for mobile. If it is always the same set of images you can use some sprite technology (sending one bigger image file containing all the images but with different x/y offset, in html you can use the backround with an offset to show the right image).
I was looking at the sidebar and saw this topic, but you're asking about patching when I saw the comments.
The best way to make sure is that the user knows what to do with it. You want the user to download X file and have Y output for a different purpose. On the other hand, it appears common practice is that chunks of resources for those not native to the Android app and not able to fit in the APK.
A comparable example is the JDIC apps, which use the popular Japanese resource that are in tandem used for English translations. JDIC apps like WWWJDIC use online downloads for the extremely large reference files that would otherwise have bad latency (which have been mentioned before) on Google servers. It's also bad rep to have >200 MB on Google apps unless it is 3D, which is justifiable. If your images cannot be compressed without extremely long loading times on the app itself, you may need to consider this option. The only downside is to request online connection (also mentioned before).
Also, you could use 7zip and program Android to self-extract it to a location. http://www.wikihow.com/Use-7Zip-to-Create-Self-Extracting-excutables
On another note, it would be optimal for the user to perform routine checks on the app while having a one-time download on initial startup. You can then optionally put in an AsyncTask so that your files will be downloaded to the app and used after restart or however you want it, so you really need only one AsyncTask. The benefit of this is that the user syncs on the apps and he may need to check only once. The downside is that the user may not always be able to update and may need to use 4G or LTE, but that is a minor concern if he can use WiFi whenever he wants.
I'm trying to run some code while data is still being uploaded. Is it possible with Apache+PHP?
Basically, I'm trying to read from php://input before client->server upload completes. The upload itself may take several hours and I'd like to do some logging while it's working. I don't mind reading from a blocking file descriptor.
Cheers
No, your script does not take control until the upload has completed. The APC stuff mentioned in a comment above is limited to PHP writing out upload status to a file elsewhere on the system, which can then be read by another script to send out as progress information.
But while an upload is going, you cannot do anything with that connection until the upload completes or is aborted.
PHP is also badly suited for very large uploads - consider what happens if your link were to hiccup 10bytes from the end of that very long/large upload. Since PHP is not in control of the upload, there's no way to resume. FTP/SFTP are better suited for such things, especially since they won't delete an aborted upload.
I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks
Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.
Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.
PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.
I need to upload potentially big (as in, 10's to 100's of megabytes) files from a desktop application to a server. The server code is written in PHP, the desktop application in C++/MFC. I want to be able to resume file uploads when the upload fails halfway through because this software will be used over unreliable connections. What are my options? I've found a number of HTTP upload components for C++, such as http://www.chilkatsoft.com/refdoc/vcCkUploadRef.html which looks excellent, but it doesn't seem to handle 'resume' of half done uploads (I assume this is because HTTP 1.1 doesn't support it). I've also looked at the BITS service but for uploads it requires an IIS server. So far my only option seems to be to cut up the file I want to upload into smaller pieces (say 1 meg each), upload them all to the server, reassemble them with PHP and run a checksum to see if everything went ok. To resume, I'd need to have some form of 'handshake' at the beginning of the upload to find out which pieces are already on the server. Will I have to code this by hand or does anyone know of a library that does all this for me, or maybe even a completely different solution? I'd rather not switch to another protocol that supports resume natively for maintenance reasons (potential problems with firewalls etc.)
I'm eight months late, but I just stumbled upon this question and was surprised that webDAV wasn't mentioned. You could use the HTTP PUT method to upload, and include a Content-Range header to handle resuming and such. A HEAD request would tell you if the file already exists and how big it is. So perhaps something like this:
1) HEAD the remote file
2) If it exists and size == local size, upload is already done
3) If size < local size, add a Content-Range header to request and seek to the appropriate location in local file.
4) Make PUT request to upload the file (or portion of the file, if resuming)
5) If connection fails during PUT request, start over with step 1
You can also list (PROPFIND) and rename (MOVE) files, and create directories (MKCOL) with dav.
I believe both Apache and Lighttpd have dav extensions.
You need a standard size (say 256k). If your file "abc.txt", uploaded by user x is 78.3MB it would be 313 full chunks and one smaller chunk.
You send a request to upload stating filename and size, as well as number of initial threads.
your php code will create a temp folder named after the IP address and filename,
Your app can then use MULTIPLE connections to send the data in different threads, so you could be sending chunks 1,111,212,313 at the same time (with separate checksums).
your php code saves them to different files and confirms reception after validating the checksum, giving the number of a new chunk to send, or to stop with this thread.
After all thread are finished, you would ask the php to join all the files, if something is missing, it would goto 3
You could increase or decrease the number of threads at will, since the app is controlling the sending.
You can easily show a progress indicator, either a simple progress bar, or something close to downthemall's detailed view of chunks.
libcurl (C api) could be a viable option
-C/--continue-at
Continue/Resume a previous file transfer at the given offset. The given offset is the exact number of bytes that will be skipped, counting from the beginning of the source file before it is transferred to the destination. If used with uploads, the FTP server command SIZE will not be used by curl.
Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.
If this option is used several times, the last one will be used
Google have created a Resumable HTTP Upload protocol. See https://developers.google.com/gdata/docs/resumable_upload
Is reversing the whole proccess an option? I mean, instead of pushing file over to the server make the server pull the file using standard HTTP GET with all bells and whistles (like accept-ranges, etc.).
Maybe the easiest method would be to create an upload page that would accept the filename and range in parameter, such as http://yourpage/.../upload.php?file=myfile&from=123456 and handle resumes in the client (maybe you could add a function to inspect which ranges the server has received)
# Anton Gogolev
Lol, I was just thinking about the same thing - reversing whole thing, making server a client, and client a server. Thx to Roel, why it wouldn't work, is clearer to me now.
# Roel
I would suggest implementing Java uploader [JumpLoader is good, with its JScript interface and even sample PHP server side code]. Flash uploaders suffer badly when it comes to BIIIGGG files :) , in a gigabyte scale that is.
F*EX can upload files up to TB range via HTTP and is able to resume after link failures.
It does not exactly meets your needs, because it is written in Perl and needs an UNIX based server, but the clients can be on any operating system. Maybe it is helpful for you nevertheless:
http://fex.rus.uni-stuttgart.de/
Exists the protocol called TUS for resumable uploads with some implementations in PHP and C++