Handling Very Large Uploads [duplicate] - php

I want to allow uploads of very large files into our PHP application (hundred of megs - 8 gigs). There are a couple of problems with this however.
Browser:
HTML uploads have crappy feedback, we need to either poll for progress (which is a bit silly) or show no feedback at all
Flash uploader puts entire file into memory before starting the upload
Server:
PHP forces us to set post_max_size, which could result in an easily exploitable DOS attack. I'd like to not set this setting globally.
The server also requires some other variables to be there in the POST vars, such as an secret key. We'd like to be able to refuse the request right away, instead of after the entire file is uploaded.
Requirements:
HTTP is a must.
I'm flexible with client-side technology, as long as it works in a browser.
PHP is not a requirement, if there's some other technology that will work well on a linux environment, that's perfectly cool.

upload_max_filesize can be set on a per-directory basis; the same goes for post_max_size
e.g.:
<Directory /uploadpath/>
php_value upload_max_filesize 10G
php_value post_max_size 10G
</IfModule>

Python Handler?
Using a Python POST handler instead of PHP. Generate a unique identifier from your PHP app that the client can put in the HTTP headers. With mod_python to reject or accept the large upload before the entire POST body is transmitted.
I think
http://www.modpython.org/live/current/doc-html/dir-handlers-hph.html
Allows you to check headers and decline the rest of the POST input. I haven't tried it but might be the right path?
Looking at the source of mod_python, the buffering of the input via read() seems to allow bit-at-a-time evaluation of the HTTP input. Headers are first.
https://svn.apache.org/repos/asf/quetzalcoatl/mod_python/trunk/src/filterobject.c

It's old I know, but maybe someone have this problem nowdays ,too.
Now you can do this with only Javascript and, say, PHP. No Flash or Java required on client side.
demo: http://dnduploader.filkor.org/
The idea is to slice the files with Javascript's Blob slice() method...

How about a Java applet? That's how we had to do it at a company I previously worked for. I know applets suck, especially in this day and age with all our options available, but they really are the most versatile solution to desktop-like problems encountered in web development. Just something to consider.

You can set the post_max_size for just scripts in 1 directory. Place your upload script there, and allow only that script to handle large sizes. It's still possible for that script to be attacked with large/useless files, but it avoids setting it globally.
Use that with APC and you might be able to work out something good:
IBM Developer works article on APC

Tried all of this... this is by far the best I have used yet...
http://www.uploadify.com/

Take a look at jumploader.com
A good java-applet for uploading.
I've used it for uploading images and it works fine. Haven't tried with bigger files than 10MB, but i should work for really big files too.

Have you looked into using APC to check the progress and total file size. Here is a good blog post about it. It might help.

Maybe you could use Webdav and Javascript in the browser
AJAX Big file upload, with progress, to WebDAV
http://www.webdavsystem.com/ajax/programming/upload_progress
A simple library
http://debris.demon.nl/projects/davclient.js/doc/README.html
You can then get the JS to redirect the user to a success page. Secret keys and what-not can be handled in a PHP prelude before handing off the JS Client->WebDAV

I would look into FTP, SSH or SCP this allows you to upload a large file and still have access control over the file as well. This might take a little longer to implement but its probably the most secure way I could think of.

I know it sucks to add another dependency but in my experience, most websites that are doing something like this are using flash on the client side, and uploading the large file as chunks
adobe as a howto on flash file uploads
I also found this tutorial on codeproject:
Multiple File Upload With Progress Bar Using Flash and ASP.NET
PS - I know you're using PHP and not .net, I figured the important part was the flash ;)

I've had success with uploadify, and I would recommend it. It's a jQuery/Flash script that handles large uploads, and you can pass extra parameters to it (like the secret key). To solve the server-side issues, simply use the following code. The changes take affect just for the script they're called in:
//Check to see if the key is there
if(!isset($_POST['secret_key']) || !isValid($_POST['secret_key']))
{
exit("Invalid request");
}
function isValid($key)
{
//Put your validation code here.
}
//This line changes the timeout.
//Give it a value in seconds (3600 = 1 hour)
set_time_limit(3600);
//Set these amounts to whatever you need.
ini_set("post_max_size","8192M");
ini_set("upload_max_filesize","8192M");
//Generally speaking, the memory_limit should be higher
//than your post size. So make sure that's right too.
ini_set("memory_limit","8200M");
EDIT In response to your comment:
Given what you've said, I'm afraid you may not be able to meet your requirements over http. All of the solutions out there are code that add features to http that it was never designed for.
Like you said yourself, it's a simple protocol. Apart from writing your own client software that runs outside of the browser, a java applet, or using a different protocol (like FTP, which was designed for this), you might not get what you want.
I've done the best I could within the given constraints. Sorry I couldn't do better.

Try this: http://www.simple2ftp.com uses a Java based FTP applet from within a clever PHP application wrapper.

Related

Transfer large data from one server to another

I m currently trying to transfer large data from one server to another using php cURL (posting the data). In some cases the remote server is getting incomplete data(corrupted).
Is there any other way to achieve this reliably
EDIT - 1
Using FTP seems good idea, anybody would like to say that it is bad or i should avoid it for any reason (Suggestions - #Ed Heal, #Neo)
I would guess your php session is timing out. See How to increase the execution timeout in php?
Or you could get curl to run in it's own thread. Call it from a bash script maybe.
Posting large files is not what http is for. Ftp is for transferring files. Hence the name.
But if you are stuck on using http, you can take a look at the WebDAV extensions to http. There is a php library called SabreDAV that you should take a look at:
http://code.google.com/p/sabredav/
You can even use scp to do so, so that the data transfer is secure as well. You would be able to find libraries to do so. Also basic function in php can be useful: http://php.net/manual/en/function.ssh2-sftp.php
As you say that it is truncated, I would imaging that the server has a file limitation size - i.e. to prevent abuse and denial of service attacks.
I would stick to FTP and perhaps compressing the files.

How to get progress of file as it's uploading via ajax?

So far I've figured out HOW to upload files asynchronously with Ajax and PHP, no problem there. But I want to get the percentage of the file that's already been uploaded, as it's uploading, and, after hours of research, I can't find a good way to do this without cheating.
Some implementations I've seen used Flash to upload, and getting the percentage in Flash is apparently fairly common, but I'd like to avoid this if I can.
Any ideas?
The core problem is that RFC 1867, the specification for file uploads over HTTP via the multipart/form-data MIME type, does not provide any method for providing file upload progress.
A file upload is actually just a fancy form submit. CGI scripts, PHP, and all other web technologies that rely on a front-end web server to first accept the request might not actually begin executing until the entire upload has completed. This means that they generally can't even know when the upload has started, only when it's been completed.
New versions of PHP's APC extension include a workaround for this problem that performs some level of black magic that allows it to know about uploads earlier. It only works as part of mod_php, though. The devs don't seem to have plans to support it under FastCGI.
Another server-side option would be the "uploadprogress" PECL extension. I'm not entirely sure what kind of black magic it uses. The source suggests that it actually hooks into the processing of the multipart MIME parts. (This suggests that at least some SAPIs stream form data to PHP as the client uploads it. I know that at least some FastCGI servers buffer the entire request before passing it along, so this might not work for you. YMMV.)
Both of these options are for normal file uploads. Ajax -- or rather, XMLHttpRequest -- does not support file upload operations. Most of the workarounds in this area involve creating an iframe and submitting a form there, and that also implies someone else's client-side work. If you're going to go through that level of hoop jumping, you'd may as well use one of the modern file upload widgets.
Personally, I use Plupload, a Javascript widget that can work with everyone's favorite Javascript library, jQuery. Some others swear by Uploadify. Regardless, both of these widgets offer a high degree of user feedback as to upload progress. They are likely to be easier for you to implement than APC or uploadprogress and have the advantage of being built and tested thoroughly by other people.
Plupload supports multiple upload engines, including HTML5, Gears, Flash, Silverlight, oldschool HTML4 and more. Between HTML5, Flash and Silverlight, you've pretty much just covered 100% of your audience. It also allows you to subscribe to events and have your own code perform magic. For example, if you need server-side file upload progress information, you can have the client regularly send updates to a different script. This would be useful if you regularly have clients uploading huge files and you want to know about it in real time.
tl;dr: Uploading is hard, let's go client-side!
Yeah,I dont like that "cheating" method either, In my opinon, the best method is to use APC , and its method, apc_fetch
Using ajax to make a apc_fetch, with a unique key specifying the upload, will return what you need .. ie bytes uploaded / total bytes.
Then simply do a progress bar with javascript.
I have heard chrome and safari dont allow you to do ajax calls during post upload, the work arround includes using an iframe to do the calls with the apc identifier.

GZip Compression For JQuery Without Server Access

all. I am required to build a website with each page under 130kb. I know that JQuery 1.4.4 is ~28kb when it's g-zipped, but it's 77kb minified, which is just too much for this particular assignment. I have already built the entire site using JQuery in one implementation or another on each page, so scrapping it would mean days of wasted time.
With that in mind,
1) Can I add content headers to a javascript file to add "Content-Encoding: gzip" without modifying config files on the server end? I'm uploading them to the university server, but I don't have access to the configuration. From the response header, the server is: Apache/1.3.26 (UnitedLinux) mod_ssl/2.8.10 OpenSSL/0.9.6g PHP/4.2.2 mod_perl/1.27
2) From the phpinfo file, I know that ZLIB compression is enabled, but "zlib.output_compression" is not.
3) I realize this can be done using .htaccess. However, I'd like to do it any other way, if possible, since I don't want the school thinking I'm trying to modify their server configuration.
4) Will XHR's setrequestheader method work here, or is that only good for asynchronous files?
I know this is short notice and all, but my Final presentation is tomorrow, and I'll lose a ton of points if my site is over the size limit. Any help would be much appreciated!
You have two options
Use jquery hosted somewhere else
that supports gzip. The file is here
that you need to include
http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.js
This gives you a lot more advantages
such as parallel downloading of
files and quicker page load times.
Other options is to use PHP code to
zip the jquery js and return it.
Here is an example
http://www.lateralcode.com/gzip-files-with-htaccess-and-php/
Use http://ajax.googleapis.com/ajax/libs/jquery/1.4/jquery.min.js, which is GZIPed.

Large file uploads from web pages

I code primarily in PHP and Perl. I have a client who is insisting on seeking video submissions (any encoding) from the public via one of their pages rather than letting YouTube do its job.
Server in question is a virtual machine and I can adjust ini settings for max post, max upload size etc as needed.
My initial thought is to use a Flash based uploader with PHP on the back end but I wondered if someone might have useful advice and experience on the subject?
Doing large file transfers of HTTP is not usually fun -- but sometimes it's necessary.
For large files, you'll definitely want to provide some kind of progress gauge for end-users.
There are flash-based tools that do this (swfUpload comes to mind).
If you want to avoid flash and do it with pretty html/javascript/css, you can leverage PHP's APC extension, which for some reason provides support for getting upload status from the server, as explained here
You can adjust the post size and use a normal html form. The big problem is not Apache, its http. If anything goes wrong in the transmission you will have no way to detect the error. Further more there is no way to resume the transfer. This is exactly why BitTorrent is so popular.
I don't know how against youtube your client is, but you can use their api to do the uploads from a page on your site.
http://code.google.com/apis/youtube/2.0/developers_guide_protocol.html#Uploading_Videos
See: browser based uploading.
For web-based uploads, there's not many options. Regardless of web platform, web server, etc. you're still transferring over HTTP. The transfer is all or nothing.
Your best option might be to find a Flash, Java, or other client side option that can chunk files and upload them piecemeal, then do a checksum to verify. That will allow for resuming uploads. Unfortunately, I don't know of any such open source component that does this.
Try to convince your client to change point of view.
Using http (and the browser, hell, the browser!) for this kind of issue is rarely a good deal; Will his users wait 40 minutes with the computer and the browser running until the upload is complete?
I dont think so.
Maybe, you could set up a public ftp account, where users can upload but not download and see the others user's files.. then, who want to use FTP software can, who like to do it via browser can too.
The big problem dealing using a browser is that, if something go wrong, you cant resume but have to restart from zero again.
the past year i had the same issue, i gave a look to ZUpload
, but i didnt use it so i can suggest (we wrote a small python script that we send to our customer; the python script create a torrent of the folder our costumer need to send to us, and we download it via utorrent ;)
p.s: again, sorry for my bad english ;)
I used jupload. Yes it looks horrible, but it just works.
With that said, it's still a better idea to convince the client that doing so is stupid.
I would agree with others stating that using HTML is a poor option. I believe there is a size limitation using Flash as well. I know of a script that uses a JavaScript Applet to perform an actual FTP transfer. It is called Simple2FTP and can be found at http://www.simple2ftp.com
Not sure but perhaps worth a try?

Sending large files via HTTP

I have a PHP client that requests an XML file over HTTP (i.e. loads an XML file via URL). As of now, the XML file is only several KB in size. A problem I can foresee is that the XML becomes several MBs or Gbs in size. I know that this is a huge question and that there are probably a myriad of solutions, but What ideas do you have to transport this data to the client?
Thanks!
based on your use case i'd definitely suggest zipping up the data first. in addition, you may want to md5 hash the file and compare it before initiating the download (no need to update if the file has no changes), this will help with point #2.
also, would it be possible to just send a segment of XML that has been instead of the whole file?
Ignoring how well a browser may or may-not handle a GB-sized XML file, the only real concern I can think of off the top of my head is if the execution time to generate all the XML is greater than any execution time thresholds that are set in your environment.
PHP's max_execution_time setting
PHP's set_time_limit() function
Apache's TimeOut Directive
Given that the XML is created dynamically with your PHP, the simplest thing I can think of is to ensure that the file is gzipped automatically by the webserver, like described here, it offers a general PHP approach and an Apache httpd-specific solution.
Besides that, having a browser (what else can be a PHP-client?) do such a job every night for some data synchonizing sounds like there must be a far simpler solution somewhere else.
And, of course, at some point, transferring "a lot" of data is going to take "a lot" of time...
The problem is that he's syncing up two datasets. The problem is completely misstated.
You need to either a) keep a differential log of changes to dataset A to that you can send that log to dataset B, or b) keep two copies of the dataset (last nights and the current dataset), and then compare them so you can then send the differential log from A to B.
Welcome to the world of replication.
The problem with (a) is that it's potentially invasive to all of your code, though if you're using an RDBMS you could do some logging perchance via database triggers to keep track of inserts/updates/deletes, and write the information in to a table, then export the relevant rows as your differential log. But, that can be nasty too.
The problem with (b) is the whole "comparing the database" all at once. Fine for 100 rows. Bad for 10^9 rows. Nasty nasty.
In fact, it can all be nasty. Replication is nasty.
A better plan is to look into a "real" replication system designed for the particular databases that you're running (assuming you're running a database). Something that perhaps sends database log records over for synchronization rather than trying to roll your own.
Most of the modern DBMS systems have replication systems.
Gallery2, which allows you to upload photos over http, makes you set up a couple of php parameters, post_max_size and upload_max_filesize, to allow larger uploads. You might want to look into that.
It seems to me that posting large files has problems with browser time-outs and the like, but on the plus side it works with proxy servers and firewalls better than trying a different file upload protocol.
Thanks for the responses. I failed to mention that transferring the file should be relatively fast (few mintues max, is this even possible?). The XML that is requested will be parsed and inserted into a database every night. The XML may be the same from the night before, or it may be different. One solution that was proposed is to zip the xml file and then transfer it. So there are basically two requirements: 1. it has to relatively fast 2. it should minimize the number of writes to the database.
One solution that was proposed is to zip the xml file and then transfer it. but that only satisfies (1)
Any other ideas?
Are there any algorithms that I could apply to compress the XML? How are large files such as MP3s being downloaded in a matter of seconds?
PHP receiving GB's of data will take long and is overhead.
Even more perceptible to flaws.
I would - dispatch the assignment to a shellscript (wget with simple error catching) that is not bothered by execution time and on failure could perhaps even retry on its own merit.
Am not experienced with this, but though one could use exec() or alike, these sadly run modal.
Calling a script with **./test.sh &** makes it run in background and solves that problem / i guess. The script could easily let your PHP pick it back up via a wget `http://yoursite.com/continue-xml-stuff.php?id=1049381023&status=0ยด. The id could be a filename, if you don't need to backtrack lost requests. The status would indicate how the script ended up handling the request.
Have you thought about using some sort of version control system to handle this? You could leverage its ability to calculate and send just the differences in the files, plus you get the added benefits of maintaining a version history of your file.
Since I don't know the details of your situation I'll throw question out there. Just for sake of argument does it have to be HTTP? FTP is much better suited for large data transfer and can be automated easily via PHP or Perl.
If you are using Apache, you might also consider Apache mod_gzip. This should allow you to compress the file automatically and the decompression should also happen automatically, as long as both sides accept gzip compression.

Categories