Mounting Amazon S3 bucket for use in web application - php

I will be launching a web application soon which will require users to upload pictures for others to view. I would like to use Amazon S3 to store the images as it scales and is cheap. The user will use a form to upload their file, it will be processed with php and saved to the S3 mount thats attached to the web server.
I am anticipating and hoping tens or hundreds of thousands of images will eventually be uploaded.
My first question is whether an S3 bucket mount is robust and fast enough for such an application, or would I be better off using Amazon EBS. Although I'd like to have my own dedicated box rather than use an EC2 instance.
Also, I am at this point unfamiliar with S3, but when I do upload files, is it appropriate to put them in a single bucket rather than a cascade of directories? It seems it might be ok since each 'bucket' is virtual anyway.

One of the things you can do is to have your users upload to S3 bucket directly, unless you want to do some processing. You can use POST to upload the files to S3 or one of the 3rd party components such as http://flajaxian.com/ This way you can significantly offload your server.
As for your second question it is actually up to you how your design your app. there is no pros and cons.

Related

S3 or EFS - Which one is best to create dynamic image from set of images?

I have a quiz site which creates images from a set of source images, the result images are stored in S3 and i don't care about it. My question is about source images, S3 or EFS is better for storing the source images for this purpose. I am using php to create result images.
Here's a general rule for you: Always use Amazon S3 unless you have a reason to do otherwise.
Why?
It has unlimited storage
The data is replicated for resilience
It is accessible from anywhere (given the right permissions)
It has various cost options
Can be accessed by AWS Lambda functions
The alternative is a local disk (EBS) or a shared file system (EFS). They are more expensive, can only be accessed from EC2 and take some amount of management. However, they have the benefit that they act as a directly-attached storage device, so your code can reference it directly without having to upload/download.
So, if your code needs the files locally, the EFS would be a better choice. But if you code can handle S3 (download from it, use the files, upload the results), then S3 is a better option.
Given your source images will (presumably) be at a higher resolution than those you are creating, and that once processed, they will not need to be accessed regularly after while (again, presumably), I would suggest that the lower cost of S3 and the archiving options available there means it would be best for you. There's a more in depth answer here:
AWS EFS vs EBS vs S3 (differences & when to use?)

Uploading to an AWS S3 bucket using a Job Queues

I'm developing an application that will run in a elastic environment on AWS (Ec2 instances with autoscaling). All the app is being developed in PHP.
The core of the app is based on safely storing files in a S3 bucket. As the user doesn't needs to know where was it saved, I thought that I could make this store the file temporarily in the EC2 instance and then asynchronously move it to S3, using a job queue (Amazon SQS) to avoid duplicating the wait time and having better support for s3 problems (they aren't common, but can happen).
My questions are:
Does this approach sounds good or I'm missing something?
When processing the job from the queue, the worker instance will have to connect to the original s3 instance, retrieve the file from it and then upload it to s3?
How can avoid having problems when the autoscaling? An instance could be deleted before I store the file in the S3 bucket.
Ideally, you don't want your main app server being tied during file uploads (both to the app server and subsequently to S3).
CORS (Cross Origin Resource Sharing) exists to avoid precisely this. You can upload the file to S3 directly from the client-side and let amazon worry about handling multiple uploads from your concurrent users. It lets your app do what it does best without having to worry about the uploads themselves.
This SO question discusses the same issue and there are several customisable plugins like fine uploader out there which can wrap around this with progress bars, etc.
This completely removes the need to make use of any kind of queue. If you need to do certain bookkeeping operations after the upload, you could simply make an ajax call to your server after the upload is complete with the file info, etc. It should also address any concerns you might have with instances being removed due to autoscaling since everything is client side.

How can I mount an S3 bucket to an EC2 instance and write to it with PHP?

I'm working on a project that is being hosted on Amazon Web Services. The server setup consists of two EC2 instances, one Elastic Load Balancer and an extra Elastic Block Store on which the web application resides. The project is supposed to use S3 for storage of files that users upload. For the sake of this question, I'll call the S3 bucket static.example.com
I have tried using s3fs (https://code.google.com/p/s3fs/wiki/FuseOverAmazon), RioFS (https://github.com/skoobe/riofs) and s3ql (https://code.google.com/p/s3ql/). s3fs will mount the filesystem but won't let me write to the bucket (I asked this question on SO: How can I mount an S3 volume with proper permissions using FUSE). RioFS will mount the filesystem and will let me write to the bucket from the shell, but files that are saved using PHP don't appear in the bucket (I opened an issue with the project on GitHub). s3ql will mount the bucket, but none of the files that are already in the bucket appear in the filesystem.
These are the mount commands I used:
s3fs static.example.com -ouse_cache=/tmp,allow_other /mnt/static.example.com
riofs -o allow_other http://s3.amazonaws.com static.example.com /mnt/static.example.com
s3ql mount.s3ql s3://static.example.com /mnt/static.example.com
I've also tried using this S3 class: https://github.com/tpyo/amazon-s3-php-class/ and this FuelPHP specific S3 package: https://github.com/tomschlick/fuel-s3. I was able to get the FuelPHP package to list the available buckets and files, but saving files to the bucket failed (but did not error).
Have you ever mounted an S3 bucket on a local linux filesystem and used PHP to write a file to the bucket successfully? What tool(s) did you use? If you used one of the above mentioned tools, what version did you use?
EDIT
I have been informed that the issue I opened with RioFS on GitHub has been resolved. Although I decided to use the S3 REST API rather than attempting to mount a bucket as a volume, it seems that RioFS may be a viable option these days.
Have you ever mounted an S3 bucket on a local linux filesystem?
No. It's fun for testing, but I wouldn't let it near a production system. It's much better to use a library to communicate with S3. Here's why:
It won't hide errors. A filesystem only has a few errors codes it can send you to indicate a problem. An S3 library will give you the exact error message from Amazon so you understand what's going on, log it, handle corner cases, etc.
A library will use less memory. Filesystems layers will cache lots of random stuff that you many never use again. A library puts you in control to decide what to cache and not to cache.
Expansion. If you ever need to do anything fancy (set an ACL on a file, generate a signed link, versioning, lifecycle, change durability, etc), then you'll have to dump your filesystem abstraction and use a library anyway.
Timing and retries. Some fraction of requests randomly error out and can be retried. Sometimes you may want to retry a lot, sometimes you would rather error out quickly. A filesystem doesn't give you granular control, but a library will.
The bottom line is that S3 under FUSE is a leaky abstraction. S3 doesn't have (or need) directories. Filesystems weren't built for billions of files. Their permissions models are incompatible. You are wasting a lot of the power of S3 by trying to shoehorn it into a filesystem.
Two random PHP libraries for talking to S3:
https://github.com/KnpLabs/Gaufrette
https://aws.amazon.com/sdkforphp/ - this one is useful if you expand beyond just using S3, or if you need to do any of the fancy requests mentioned above.
Quite often, it is advantageous to write files to the EBS volume, then force subsequent public requests for the file(s) to route through CloudFront CDN.
In that way, if the app must do any transformations to the file, it's much easier to do on the local drive & system, then force requests for the transformed files to pull from the origin via CloudFront.
e.g. if your user is uploading an image for an avatar, and the avatar image needs several iterations for size & crop, your app can create these on the local volume, but all public requests for the file will take place through a cloudfront origin-pull request. In that way, you have maximum flexibility to keep the original file (or an optimized version of the file), and any subsequent user requests can either pull an existing version from cloud front edge, or cloud front will route the request back to the app and create any necessary iterations.
An elementary example of the above would be WordPress, which creates multiple sized/cropped versions of any graphic image uploaded, in addition to keeping the original (subject to file size restrictions, and/or plugin transformations). CDN-capable WordPress plugins such as W3 Total Cache rewrite requests to pull through CDN, so the app only needs to create unique first-request iterations. Adding browser caching URL versioning (http://domain.tld/file.php?x123) further refines and leverages CDN functionality.
If you are concerned about rapid expansion of EBS volume file size or inodes, you can automate a pruning process for seldom-requested files, or aged files.

Amazon S3 Cloud architecture

I'm writing an app that let's you share files in the cloud.
You make a user-account and can upload your files with and send links to friends.
I'm using Amazon S3 to save the data.
But I'm not sure how I should proceed.
There are buckets, which you can create in S3, and in those buckets you save your files.
I thought about making a bucket for each user, but then I read that you can only have 100 buckets at a time.
Isn't there a better way to managing this then to just save all user files in one "directory".
This will get so messy. I have never used S3 before, I would be very thankful for any advice.
And if this is the only way, what naming convention proved to be the best?
Thanks!
Even though S3 has a flat structure within a bucket, each object has its own path much like the directories you're used to.
So you can structure your paths like so:
/<user-id>/<album-1>/...
One thing to keep in mind is that not all directory related features are available, such as:
Deny access to /<user-123>/*,
Copy from one directory to another.

Using Amazon S3 php REST API vs. Mounting S3 bucket to server (s3fs)

I will be launching an application in the very near future which will, in part, require users to upload files (images) to be viewed by other members. I like the idea of S3 as it is relatively cheap and scales automatically.
My problem is how I will have users upload their images to S3. It seems there are a few options.
1- Use the php REST API. The only problem is that I can't get it to work for uploading variously scaled versions (ie thumbnails) of the same image simultaneously and uploading them directly to s3 (it works for just one image at a time this way). Overall, it just seems less flexible.
http://net.tutsplus.com/tutorials/php/how-to-use-amazon-s3-php-to-dynamically-store-and-manage-files-with-ease/
2- The other option would be to mount an S3 bucket with s3fs. Then just programmatically move my images into the bucket like I would with NFS. From what I've read, it seems some people are dubious of the reliability of mounting S3. Is this true?
http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=fuse+over+amazon
Which method would be better for maximum reliability and speed?
Would EBS be something to consider? I would really like to have a dedicated box rather than use an EC2 instance, though...
For your use case I recommend to use the S3 API directly rather than using s3fs because of performance. Remember that s3fs is just another layer on top of S3's API and it's usage of that API is not always the best one for your application.
To handle the creation of thumbnails, I recommend to decouple that from the main upload process by using Amazon Simple Queue Service. That way your users will receive a response as soon as a file is uploaded without having to wait for it to be processed resulting in shorter response times.
As for using EBS, that is a different scenario. EBS is just a persistent storage for and Amazon EC2 instance and it's reliability doesn't compare with S3.
It's also important to remember that S3 only offers "eventual consistency" as opposed to a physical HDD on your machine or an EBS instance on EC2 so you need to code your app to handle that correctly.

Categories