Best Practice for Uploading Many (2000+) Images to A Server

Best Practice for Uploading Many (2000+) Images to A Server - php

I have a general question about this.
When you have a gallery, sometimes people need to upload 1000's of images at once. Most likely, it would be done through a .zip file. What is the best way to go about uploading this sort of thing to a server. Many times, server have timeouts etc. that need to be accounted for. I am wondering what kinds of things should I be looking out for and what is the best way to handle a large amount of images being uploaded.
I'm guessing that you would allow a user to upload a zip file (assuming the timeout does not effect you), and this zip file is uploaded to a specific directory, lets assume in this case a directory is created for each user in the system. You would then unzip the directory on the server and scan the user's folder for any directories containing .jpg or .png or .gif files (etc.) and then import them into a table accordingly. I'm guessing labeled by folder name.
What kind of server side troubles could I run into?
I'm aware that there may be many issues. Even general ideas would be could so I can then research further. Thanks!
Also, I would be programming in Ruby on Rails but I think this question applies accross any language.

There's no reason why you couldn't handle this kind of thing with a web application. There's a couple of excellent components that would be useful for this:
Uploadify (based on jquery/flash)
plupload (from moxiecode, the tinymce people)
The reason they're useful is that in the first instance, it uses a flash component to handle uploads, so you can select groups of files from the file browser window (assuming no one is going to individually select thousands of images..!), and with plupload, drag and drop is supported too along with more platforms.
Once you've got your interface working, the server side stuff just needs to be able to handle individual uploads, associating them with some kind of user account, and from there it should be pretty straightforward.
With regards to server side issues, that's really a big question, depending on how many people will be using the application at the same time, size of images, any processing that takes place after. Remember, the files are kept in a temporary location while the script is processing them, and either deleted upon completion or copied to a final storage location by your script, so space/memory overheads/timeouts could be an issue.
If the images are massive in size, say raw or tif, then this kind of thing could still work with chunked uploads, but implementing some kind of FTP upload might be easier. Its a bit of a vague question, but should be plenty here to get you going ;)

For those many images it has to be a serious app.. thus giving you the liberty to suggest a piece of software running on the client (something like yahoo mail/picassa does) that will take care of 'managing' (network interruptions/resume support etc) the upload of images.
For the server side, you could process these one at a time (assuming your client is sending them that way)..thus keeping it simple.

take a peek at http://gallery.menalto.com
they have a dozen of methods for uploading pictures into galleries.
You can choose ones which suits you.

Either have a client app, or some Ajax code that sends the images one by one, preventing timeouts. Alternatively if this is not available to the public. FTP still works...

I'd suggest a client application (maybe written in AIR or Titanium) or telling your users what FTP is.
deviantArt.com for example offers FTP as an upload method for paying subscribers and it works really well.
Flickr instead has it's own app for this. The "Flickr Uploadr".

Related

View/download PHP uploads - how to do it virus safe?

Now I've read a bunch of SO topics on how to check whether PHP uploads are virus safe and the gist from that is: I can't 100% guarantee that uploads aren't full of viruses - no matter the extension. One proposed solution is to remove the extension during the upload and then reassemble it when people want to download.
However, I want to let users view files directly on the website. How do I go about doing that? For example, generating an iframe with an uploaded PDF inside - is that safe or is it like executing it which would give potential viruses the opportunity to spread? With DOCs I wanted to use Google Docs, so I'd embed an iframe of Google Docs which GETs a URL of the DOC on my server. Is that safe then?
Or is there simply no way other than only allowing downloads to prevent potential viruses from spreading on the server? If so, how goes the reassembling of the extension? I'd guess, when someone uploads a test.exe, I'd remove the .exe part but store in a database. Then when someone requests the download, i rename the test file to test.exe and push the download. After that I rename it back to test. Is that correct?
Also: how do services like Trello do this? When I upload an image file there, it gets shown directly - without noticeable delay through virus scans or whatever. I thought about using the virustotal.com API but that certainly takes quite long, doesn't it? Would it be okay though to let people upload, then not show them publicly until a virustotal.com-scan is done and then consider the file safe?
Thanks and cheers for all help and sorry, if I missed something.

There are a few approaches I've seen in practice over the years:
Scan it locally, using e.g. ClamAV.
Pro: If your virus detections are up-to-date, you'll catch any known viruses this way.
Con: Anti-virus software is an attack surface. See many of the findings of Tavis Ormandy from Google Project Zero.
Con: Could be taxing to server resources. (Maybe spin up a different server dedicated to AV purposes?)
Use an API, such as VirusTotal.
Pro: Less attack surface.
Con: You have to share the file with VirusTotal, which might be a bad idea if the files you're letting users upload are particularly sensitive (i.e. protected health information).
I'm not sure which to recommend, because I don't know your threat model or operational constraints.
However, the more general problem of not serving browser exploits (e.g. XSS) or allowing reverse shells on the server is actually somewhat easy, but not trivial.

Common practice to compress image before sending to mobile device？

My application requires downloading many images from server(each image about 10kb large). And I'm simply downloading each of them with independent AsyncTask without any optimization.
Now I'm wondering what's the common practice to transfer these images. For example, I'm thinking about saving zipped images at server, then send zipped file for user's mobile to unzip. In this case, is it better to combine the zip files into one big zip file for user to download?
Or there's better solution? Thanks in advance!
EDIT:
It seems combining zip files is a good idea, but I feel it may take too long for user to wait downloading and unzipping all images. So I may put ten or twenty images in each zip file, so user can see some downloaded ones while waiting for more to come. Having multiple AsyncTask fired together can be faster right? But they won't finish at the same time even given same file size and same address to download?

Since latency is often the largest problem with mobile connections, reducing the number of connections you have to open is a great way to optimize the loading times. Sending a zip file with all the images sounds like a very good idea, and is probably worth the time implementing.

Images probably are already compressed (gif, jpg, png). You will not reduce filesize but will reduce the number of connections. Which is a good idea for mobile. If it is always the same set of images you can use some sprite technology (sending one bigger image file containing all the images but with different x/y offset, in html you can use the backround with an offset to show the right image).

I was looking at the sidebar and saw this topic, but you're asking about patching when I saw the comments.
The best way to make sure is that the user knows what to do with it. You want the user to download X file and have Y output for a different purpose. On the other hand, it appears common practice is that chunks of resources for those not native to the Android app and not able to fit in the APK.
A comparable example is the JDIC apps, which use the popular Japanese resource that are in tandem used for English translations. JDIC apps like WWWJDIC use online downloads for the extremely large reference files that would otherwise have bad latency (which have been mentioned before) on Google servers. It's also bad rep to have >200 MB on Google apps unless it is 3D, which is justifiable. If your images cannot be compressed without extremely long loading times on the app itself, you may need to consider this option. The only downside is to request online connection (also mentioned before).
Also, you could use 7zip and program Android to self-extract it to a location. http://www.wikihow.com/Use-7Zip-to-Create-Self-Extracting-excutables
On another note, it would be optimal for the user to perform routine checks on the app while having a one-time download on initial startup. You can then optionally put in an AsyncTask so that your files will be downloaded to the app and used after restart or however you want it, so you really need only one AsyncTask. The benefit of this is that the user syncs on the apps and he may need to check only once. The downside is that the user may not always be able to update and may need to use 4G or LTE, but that is a minor concern if he can use WiFi whenever he wants.

HTTP Uploads with Resource Forks

I'm building a PHP based upload service for some of our clients. I am using SWFUpload so that I can view the progress of a file as it uploads. I've got it pretty much built, but am running into one last issue before we can release it to the public.
Many (almost all) of our clients are Mac-based and are uploading sets of files that include InDesign Files, Fonts, Illustrator Files, etc. Most of the times the images files are OK, but occasionally (and always with Type 1 Fonts) the file will become corrupted because it is losing the resource fork.
I understand why this is happening (moving from a multi-fork system to a single-fork system), but I can not find any elegant solution. In my research the best answer I've found so far is "have the user compress it". I know that works, but it's unreasonable - in our client's opinion - for us to require them to compress every set of files they are going to send.
Are there any better solutions for keeping those resource forks alive? Of course, I would prefer a solution that is straight javascript/php, but would settle for something that is flash based or (least preferably) java based.
My only requirements for the new solution would be:
View upload progress
User doesn't have to manually compress files
Here's some information about my system
Ubuntu 10.10 Server running a standard LAMP install
PHP5
SWFUpload (wtv the most recent version is)

Uploads handle files. If the browser and the underlying OS is not able to deal with forks in this procedure (map anything file onto the file model for uploads), then you're bound to what you get by the systems architecture.
Resource fork: The resource fork is a construct of the Mac OS operating system used to store structured data in a file, alongside unstructured data stored within the data fork. A resource fork stores information in a specific form, such as icons, the shapes of windows, definitions of menus and their contents, and application code (machine code).
If that's a blocker to you you might have chosen the wrong field to work in. Just saying, if you run into systematic borders, there is not much you can do about. Even if you work for graphic designers and mac users.
The swfupload would need a feature to deal with forks. For that, flash would need a feature to deal with forks. For that the browser would eventually need a feature to deal with forks. And so on.
Next to this chain, another question remains: How to deal with forks? As the upload only maps one file to a chunk of binary data, how to map the fork as well? Append it? Add an additional file?
So on the technical level this does not sound like easily solveable. All components and systems in the file input chain must support a feature that is commonly not supported at all.
So as you can not offer something to the user that does not exist, the only thing you can do is make your application more usable or user-friendly. E.g. by providing the right notes at the right time (e.g. when a user selects a Type 1 file for uploading, to remind him/her to select the fork as well). Communicating with the user can help, but keep in mind that a user needs to be spoken with in a language he/she understands.
So if you know that certain file types have forks, address the issue to someone who can solve it: The user. You can't.

You don't have to use swfupload to monitor progress.
Here are some file that demonstrate this: https://github.com/senica/Booger/tree/master/assets/js/jquery-upload
It is not documented very well, but it basically uses webkitSlice function for uploading the files in javascript. You can use the callback functions to display the progress of the files.
This would be a javascript/php solution.

Is uploading very large files (eg 500mb) via php advisable?

I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks

Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.

Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.

PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.

Large file uploads from web pages

I code primarily in PHP and Perl. I have a client who is insisting on seeking video submissions (any encoding) from the public via one of their pages rather than letting YouTube do its job.
Server in question is a virtual machine and I can adjust ini settings for max post, max upload size etc as needed.
My initial thought is to use a Flash based uploader with PHP on the back end but I wondered if someone might have useful advice and experience on the subject?

Doing large file transfers of HTTP is not usually fun -- but sometimes it's necessary.
For large files, you'll definitely want to provide some kind of progress gauge for end-users.
There are flash-based tools that do this (swfUpload comes to mind).
If you want to avoid flash and do it with pretty html/javascript/css, you can leverage PHP's APC extension, which for some reason provides support for getting upload status from the server, as explained here

You can adjust the post size and use a normal html form. The big problem is not Apache, its http. If anything goes wrong in the transmission you will have no way to detect the error. Further more there is no way to resume the transfer. This is exactly why BitTorrent is so popular.

I don't know how against youtube your client is, but you can use their api to do the uploads from a page on your site.
http://code.google.com/apis/youtube/2.0/developers_guide_protocol.html#Uploading_Videos
See: browser based uploading.

For web-based uploads, there's not many options. Regardless of web platform, web server, etc. you're still transferring over HTTP. The transfer is all or nothing.
Your best option might be to find a Flash, Java, or other client side option that can chunk files and upload them piecemeal, then do a checksum to verify. That will allow for resuming uploads. Unfortunately, I don't know of any such open source component that does this.

Try to convince your client to change point of view.
Using http (and the browser, hell, the browser!) for this kind of issue is rarely a good deal; Will his users wait 40 minutes with the computer and the browser running until the upload is complete?
I dont think so.
Maybe, you could set up a public ftp account, where users can upload but not download and see the others user's files.. then, who want to use FTP software can, who like to do it via browser can too.
The big problem dealing using a browser is that, if something go wrong, you cant resume but have to restart from zero again.
the past year i had the same issue, i gave a look to ZUpload
, but i didnt use it so i can suggest (we wrote a small python script that we send to our customer; the python script create a torrent of the folder our costumer need to send to us, and we download it via utorrent ;)
p.s: again, sorry for my bad english ;)

I used jupload. Yes it looks horrible, but it just works.
With that said, it's still a better idea to convince the client that doing so is stupid.

I would agree with others stating that using HTML is a poor option. I believe there is a size limitation using Flash as well. I know of a script that uses a JavaScript Applet to perform an actual FTP transfer. It is called Simple2FTP and can be found at http://www.simple2ftp.com
Not sure but perhaps worth a try?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.