Download a list of Objects in a bucket from S3

Download a list of Objects in a bucket from S3 - php

I have a bucket on Amazon S3 which contains hundreds of objects.
I have a web page that lists out all these objects and has a download object link in html.
This all works as expected and I can download each object individually.
How would it be possible to provide a checkbox next to each link, which allowed a group of objects to be selected and then only those objects downloaded?
So to be clear, if I chose items 1, 2, and 7 - and clicked a download link - only those object would be downloaded. This could be a zip file or one at a time although I have no idea how this would work.
I am capable of coding this up, but I am struggling to thing HOW it would work - so process descriptions are welcome. I could consider python or ruby although the web app is PHP.

I'm afraid this is a hard problem to solve.
S3 does not allow any 'in place' manipulation of files, so you cannot zip them up into a single download. In the browser, you a stuck with downloading one url at a time. Of course, there's nothing stopping the user queuing up downloads manually using a download manager, but there is nothing you can do to help with this.
So you are left with a server side solution. You'll need to download the files from S3 to a server and zip them up before delivering the zip to the client. Unfortunately, depending on the number and size of files, this'll probably take so time, so you need a notification system to let the user know when their file is ready.
Also, unless your server is running on EC2, you might be paying twice for bandwidth charges. S3 to your server and then your server to the client.

Related

Best strategy for doing real time, scalable audio processing?

I am building a web application that allows users to upload audio files, music in particular. Most of the time, I expect the duration of each song to normally be about several minutes and the file to be approximately 3-10MB in size. However, I would like to accept audio uploads up to about 100MB, possibly allowing for over an hour of audio. I am currently using a combination of FFmpeg, SoX, and LAME to convert from 7 possible formats to mp3 and perform audio modifications including equalization, trimming, and fading. The files are then stored and linked in the database.
My current strategy is to handle the entire process in one HTTP file upload request using PHP on the backend, in which I perform the following functions:
Validation
Transcode audio into multiple versions (using shell through PHP)
Store the original and transcoded versions in a temp directory
Upload all audio files to Amazon S3 for permanent storage
Commit the ID of each file to a database, linking them to the user
This works very similar to an image processing system I have already set up. However, while images can complete this whole process in just a few seconds, audio can take a lot longer. At most, audio could take about 5-10 minutes to be processed and stored.
My questions are:
For audio processing, would it be better to fork off the transcoding to another background process, writing its state to the database, and pinging it every few seconds to update the webpage vs. doing it all in one HTTP request?
With the intention of scaling in the future, would it be advisable to do all processing on a single server instance, leaving the frontend web instances free to replicate / be destroyed?
If yes, would this require cross-domain file uploading directly to that server? (Anyone know if this is how youtube or the big sites do it?)
Thanks!

If I understand your system correctly, your best approach is probably something more like this:
In your web front-end, store the audio and create a "task" indicating that the audio needs to be processed.
Run a background task that pulls tasks and does the processing. At the end of the task, the user can be notified (if necessary) and database state can be updated or whatever.
Your tasks should be written so that if they fail partway through, they can be re-executed from the start without causing problems. You can run multiple background tasks and web front-ends in this architecture.
A good way to write tasks is using a message passing system like AMQP. There are cheap services like rabbitmq that will do this for you. You can, of course, also build your own on top of any database, but this may require polling.
Finally, you might find it faster and more efficient to use a service like zencoder to do your transcoding, because they can parallelize the work and probably handle more input formats, but it may not be compatible with your processing.

you definitely want to throw the audio processing to a background process.
Depending on the scalability involved, you might need a computer dedicated to the processing. You might want to look into other resources you can offload audio stuff too (like PCIe cards and such)
Sorry to say I know nothing about cross domain file uploading or how the big dogs do it (youtube, soundcloud ect)

Common practice to compress image before sending to mobile device？

My application requires downloading many images from server(each image about 10kb large). And I'm simply downloading each of them with independent AsyncTask without any optimization.
Now I'm wondering what's the common practice to transfer these images. For example, I'm thinking about saving zipped images at server, then send zipped file for user's mobile to unzip. In this case, is it better to combine the zip files into one big zip file for user to download?
Or there's better solution? Thanks in advance!
EDIT:
It seems combining zip files is a good idea, but I feel it may take too long for user to wait downloading and unzipping all images. So I may put ten or twenty images in each zip file, so user can see some downloaded ones while waiting for more to come. Having multiple AsyncTask fired together can be faster right? But they won't finish at the same time even given same file size and same address to download?

Since latency is often the largest problem with mobile connections, reducing the number of connections you have to open is a great way to optimize the loading times. Sending a zip file with all the images sounds like a very good idea, and is probably worth the time implementing.

Images probably are already compressed (gif, jpg, png). You will not reduce filesize but will reduce the number of connections. Which is a good idea for mobile. If it is always the same set of images you can use some sprite technology (sending one bigger image file containing all the images but with different x/y offset, in html you can use the backround with an offset to show the right image).

I was looking at the sidebar and saw this topic, but you're asking about patching when I saw the comments.
The best way to make sure is that the user knows what to do with it. You want the user to download X file and have Y output for a different purpose. On the other hand, it appears common practice is that chunks of resources for those not native to the Android app and not able to fit in the APK.
A comparable example is the JDIC apps, which use the popular Japanese resource that are in tandem used for English translations. JDIC apps like WWWJDIC use online downloads for the extremely large reference files that would otherwise have bad latency (which have been mentioned before) on Google servers. It's also bad rep to have >200 MB on Google apps unless it is 3D, which is justifiable. If your images cannot be compressed without extremely long loading times on the app itself, you may need to consider this option. The only downside is to request online connection (also mentioned before).
Also, you could use 7zip and program Android to self-extract it to a location. http://www.wikihow.com/Use-7Zip-to-Create-Self-Extracting-excutables
On another note, it would be optimal for the user to perform routine checks on the app while having a one-time download on initial startup. You can then optionally put in an AsyncTask so that your files will be downloaded to the app and used after restart or however you want it, so you really need only one AsyncTask. The benefit of this is that the user syncs on the apps and he may need to check only once. The downside is that the user may not always be able to update and may need to use 4G or LTE, but that is a minor concern if he can use WiFi whenever he wants.

Get Server to Accept Multipart Uploads for Large Files with Abort/Resume Functionality like S3

Amazon S3 has a very nice feature that allows the upload of files in parts for larger files. This would be very useful to me if I was using S3, but I am not.
Here's my problem: I am going to have Android phones uploading reasonably large files (~50MB each of binary data) on a semi-regular basis. Because these phones are using the mobile network to do this, and the coverage is spotty in some of the places where they're being used, strong signal cannot be guaranteed. Therefore, doing a simple PUT with 40MB of data in the content body will not work very well. I need to split up the data somehow (probably into 10MB chunks) and upload them whenever the signal will allow it. Once all of the chunks have been uploaded, they need to be merged into a single file.
I have a basic understanding of how the client needs to behave to support this through reading Amazon's S3 Client APIs, but have no idea what the server is doing to allow this. I'm willing to write the server in Python or PHP. Are there any libraries out there for either language to allow this sort of thing? I couldn't find anything after about one hour of searching.
Basically, I'm looking for anything that can help point me in the right direction. Information on this and what protocols and headers to use to make this as RESTful as possible would be fantastic. Thanks!

From the REST API documentation for multi-part upload it seems that Amazon expects the client to break the large file into smaller multiple parts and upload them individually. Prior to uploading you need to obtain an upload id and on every upload you include the upload id and the a part number for the portion of the file being uploaded.
The way you may have to go about structuring is to create a client which can split a huge file into multiple parts and upload them in parallel using the above specified convention.

Video Uploading: recommended process?

Hey just a quick question for anyone who has done this. I want to create a video tube site. I have done file uploads before but was wondering if anyone could give me suggestions on what I am planning to do.
The way I am planning is to have a folder in my web directory and to upload videos into the folder after virus scanning and checking mime. The video will then be converted and compressed using FFMPEG into flv.
I will change the name and store the video reference id in mysql so the file name can be fetched and served.
I will serve the files using HTTP_Download to a flash player
$dl = new HTTP_Download();
$dl->setFile("$path");
$dl->setContentDisposition(HTTP_DOWNLOAD_ATTACHMENT, "$path");
$dl->setContentType('video/flv');
$dl->send();
Anyone have any suggestions? Is it a good idea to put all videos in one directory?

You may want to consider a Java based uploader as PHP can run into timeout problems on large uploads.
Also do you FFMPEG processing as a CRON job not at upload as it takes a long time.
Look in something like Wowza Streaming Server to serve the videos. Allows streaming and everything is above the root. I name each video with a UID and send a parameter to the Flash video player to decide which one to play.

Where and how you store them will largely depend on how secure they need to be (i.e. should people be able to access the files in the directory directly? or should it be stored more securely than that?)
If direct access is fine, then putting them all in one folder is okay. If not, then you may want to obscure folder names, store them in a secure Database, or in a folder that is not accessible outside of the server.
Also, I'm hoping you're aware of the massive amounts of storage space and bandwidth such a service will consume? I hope you have a scaled solution ready to deploy if you're really serious about this..

Is there a way to allow users of my site to download large volumes of image files from Amazon S3 via Flash / PHP / other service?

My website allows users to upload photographs which I store on Amazon's S3. I store the original upload as well as an optimized image and a thumbnail. I want to allow users to be able to export all of their original versions when their subscription expires. So I am thinking the following problems arise
Could be a large volume of data (possibly around 10GB)
How to manage the download process - eg make sure if it gets interrupted where to start from again, how to verify successful download of files
Should this be done with individual files or try and zip the files and download as one file or a series of smaller zipped files.
Are there any tools out there that I can use for this? I have seen Fzip which is an Actionscript library for handling zip files. I have an EC2 instance running that handles file uploads so could use this for downloads also - eg copy files to EC2 from S3, Zip them then download them to user via Flash downloader, use Fzip to uncompress the zip folder to user's hard drive.
Has anyone come across a similar service / solution?
all input appreciated
thanks

I have not dealt with this problem directly but my initial thoughts are:
Flash or possibly jQuery could be leveraged for a homegrown solution, having the client send back information on what it has received and storing that information in a database log. You might also consider using Bit Torrent as a mediator, your users could download a free torrent client and you could investigate a server-side torrent service (maybe RivetTracker or PHPBTTracker). I'm not sure how detailed these get, but at the very least, since you are assured you are dealing with a single user, if they become a seeder you can wipe the old file and begin on the next.
Break larger than 2GB files into 2GB chunks to accommodate users with FAT32 drives that can't handle > ~4GB files. Break down to 1GB if space on the server is limited, keeping a benchmark on what's been zipped from S3 via a database record
Fzip is cool but I think it's more for client side archiving. PHP has ZIP and RAR libraries (http://php.net/manual/en/book.zip.php) you can use to round up files server-side. I think any solution you find will require you to manage security on your own by keeping records in a database of who's got what and download keys. Not doing so may lead to people leeching your resources as a file delivery system.
Good luck!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.