AWS OpsWorks - Linux drive to S3 Bucket

AWS OpsWorks - Linux drive to S3 Bucket - php

What would be the best way to 'symlink' pe-say a drive on an AWS linux instance to an S3 bucket?
Currently I am utilizing Apache's ProxyPass to reverse proxy /images/ over to the S3 bucket with the following code.
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass /images/ http://s3bucket.s3-website-us-east-1.amazonaws.com/
This works great, but only for HTTP access through Apache. If I want to write to the /images/ directory from PHP itself, it gets written locally.
So I'm looking for a great way to have /images/ redirected to the S3 bucket for all types of operations. I've looked at S3FS but I'm leary of going that route.
I am also utilizing OpsWorks to work with my EC2 instances so whatever method I go with needs to be able to be automated by OpsWorks and/or Chef recipes.
Another option looked at was the AWS SDK for PHP. This may require me to change the PHP code that would do the file writing to the /images/ directory. This may be the solution in the long run, but for now I need another solution.

While s3fs can sort of fake mapping to s3, you are going to run into problems. The issue is that S3 is not at all like a disk volume.
With a disk volume, you can write and read from the disk in blocks. You can also lock files among other operations.
S3 on the other hand, is an object storage system. It operates on whole files.
S3FS sort of makes it work, but copying the file into your temp directory, so you can write to it, and then copies it back to S3 after you are done.
Simple operations may work ok, such as reading or writing an entire file at a time, but anything that works on block level may fail.
I do recommend updating your PHP code, with the existing clients provided by amazon, its quite easy to do. Instead of move_uploaded_file, you would use the client code instead.

Related

Saving files to amazons S3 using S3FS with PHP on Red Hat Linux and files being over written with nothing

When writing a file to S3 using S3FS, if that file is accessed while writing to it, the data in the file is deleted.
We had a Red Hat Linux server on which we kept a product we were beta testing when we noticed this issue. When we went to fix the issue, we moved that product to an Ubuntu instance and we no longer have that issue.
We set up a server for a client that wanted Red Hat and moved some code to that server and that server is now having the overwrite issues.

The behavior you describe makes sense. A bit of explanation of how S3 works vs standard volumes is required.
A standard volume can be read/written by the OS at a block level. Multiple processes can access the file, but some locks are required to prevent corruption.
S3 treats operations as whole files. Either the file gets uploaded in its entirety or it doesn't exist at all.
s3fs tries to create a interface to something that isn't a volume so that you can mount it on the file system. But under the covers, it copies each file you access to the local file system and stores it in a temp directory. While you can generally do whole file operations with s3fs (copying, deleting, etc.), trying to open a file directly from s3fs to to block level operations is going to end badly.
There are other options. If you can rework your code to pull and push files from s3 can work, but it sounds like you needs something that behaves more like NFS.

amazon s3 mount with fuse and s3fs

I have an Nginx Server with php-fpm installed on Centos 6.4
1. Current status:
I use the NFS server to hold versions and files, Nginx mounts the NFS and serves files from it.
this scenario is working and handles large volumes of traffic
2. Desired scenario
Replace the NFS with S3 and mount it using fuse & s3fs.
In that scenario the server fails when forced to handle high traffic.
Is fuse & s3fs much slower or am i missing something?
Thx

The problem is in thinking that S3 behaves like an NFS mount. It does not. NFS and other disk volumes work on blocks, s3 works on entire objects. Generally when accessing files from s3fs, the entire file is copied to tmp and it has much higher latency than disk access.
A couple of solutions:
If you prefer NFS, you can set up an NFS share from a separate instance and mount it on the instances that need it.
Otherwise, you can deploy your code directly to the instance itself, (which is something you can automate) and run your code from there. Static assets could be served directly from s3 via cloudfront and/or with cloudfront using a custom origin.

Take a look at RioFS project, it allows you to mount S3 bucket as local directory (we are using fuse).
As datasage mentioned earlier: you can't really compare NFS with S3, as these two are completely different filesystems used in different scenarios. RioFS let you upload / download files and list directory content. Anything extra (like appending data to a file) is not supported.

How can I mount an S3 bucket to an EC2 instance and write to it with PHP?

I'm working on a project that is being hosted on Amazon Web Services. The server setup consists of two EC2 instances, one Elastic Load Balancer and an extra Elastic Block Store on which the web application resides. The project is supposed to use S3 for storage of files that users upload. For the sake of this question, I'll call the S3 bucket static.example.com
I have tried using s3fs (https://code.google.com/p/s3fs/wiki/FuseOverAmazon), RioFS (https://github.com/skoobe/riofs) and s3ql (https://code.google.com/p/s3ql/). s3fs will mount the filesystem but won't let me write to the bucket (I asked this question on SO: How can I mount an S3 volume with proper permissions using FUSE). RioFS will mount the filesystem and will let me write to the bucket from the shell, but files that are saved using PHP don't appear in the bucket (I opened an issue with the project on GitHub). s3ql will mount the bucket, but none of the files that are already in the bucket appear in the filesystem.
These are the mount commands I used:
s3fs static.example.com -ouse_cache=/tmp,allow_other /mnt/static.example.com
riofs -o allow_other http://s3.amazonaws.com static.example.com /mnt/static.example.com
s3ql mount.s3ql s3://static.example.com /mnt/static.example.com
I've also tried using this S3 class: https://github.com/tpyo/amazon-s3-php-class/ and this FuelPHP specific S3 package: https://github.com/tomschlick/fuel-s3. I was able to get the FuelPHP package to list the available buckets and files, but saving files to the bucket failed (but did not error).
Have you ever mounted an S3 bucket on a local linux filesystem and used PHP to write a file to the bucket successfully? What tool(s) did you use? If you used one of the above mentioned tools, what version did you use?
EDIT
I have been informed that the issue I opened with RioFS on GitHub has been resolved. Although I decided to use the S3 REST API rather than attempting to mount a bucket as a volume, it seems that RioFS may be a viable option these days.

Have you ever mounted an S3 bucket on a local linux filesystem?
No. It's fun for testing, but I wouldn't let it near a production system. It's much better to use a library to communicate with S3. Here's why:
It won't hide errors. A filesystem only has a few errors codes it can send you to indicate a problem. An S3 library will give you the exact error message from Amazon so you understand what's going on, log it, handle corner cases, etc.
A library will use less memory. Filesystems layers will cache lots of random stuff that you many never use again. A library puts you in control to decide what to cache and not to cache.
Expansion. If you ever need to do anything fancy (set an ACL on a file, generate a signed link, versioning, lifecycle, change durability, etc), then you'll have to dump your filesystem abstraction and use a library anyway.
Timing and retries. Some fraction of requests randomly error out and can be retried. Sometimes you may want to retry a lot, sometimes you would rather error out quickly. A filesystem doesn't give you granular control, but a library will.
The bottom line is that S3 under FUSE is a leaky abstraction. S3 doesn't have (or need) directories. Filesystems weren't built for billions of files. Their permissions models are incompatible. You are wasting a lot of the power of S3 by trying to shoehorn it into a filesystem.
Two random PHP libraries for talking to S3:
https://github.com/KnpLabs/Gaufrette
https://aws.amazon.com/sdkforphp/ - this one is useful if you expand beyond just using S3, or if you need to do any of the fancy requests mentioned above.

Quite often, it is advantageous to write files to the EBS volume, then force subsequent public requests for the file(s) to route through CloudFront CDN.
In that way, if the app must do any transformations to the file, it's much easier to do on the local drive & system, then force requests for the transformed files to pull from the origin via CloudFront.
e.g. if your user is uploading an image for an avatar, and the avatar image needs several iterations for size & crop, your app can create these on the local volume, but all public requests for the file will take place through a cloudfront origin-pull request. In that way, you have maximum flexibility to keep the original file (or an optimized version of the file), and any subsequent user requests can either pull an existing version from cloud front edge, or cloud front will route the request back to the app and create any necessary iterations.
An elementary example of the above would be WordPress, which creates multiple sized/cropped versions of any graphic image uploaded, in addition to keeping the original (subject to file size restrictions, and/or plugin transformations). CDN-capable WordPress plugins such as W3 Total Cache rewrite requests to pull through CDN, so the app only needs to create unique first-request iterations. Adding browser caching URL versioning (http://domain.tld/file.php?x123) further refines and leverages CDN functionality.
If you are concerned about rapid expansion of EBS volume file size or inodes, you can automate a pruning process for seldom-requested files, or aged files.

php temp file upload directory - off local server

When uploading an image PHP stores the temp image in a local on the server.
Is it possible to change this temp location so its off the local server.
Reason: using loading balancing without sticky sessions and I don't want files to be uploaded to one server and then not avaliable on another server. Note: I don't necessaryly complete the file upload and work on the file in the one go.
Preferred temp location would be AWS S3 - also just interested to know if this is possible.
If its not possible I could make the file upload a complete process that also puts the finished file in the final location.
just interested to know if the PHP temp image/file location can be off the the local server?
thankyou

You can mount S3 bucket with s3fs on your Instances which are under ELB, so that all your uploads are shared between application Servers. About /tmp, don't touch it as destination is S3 and it is shared - you don't have to worry.
If you have a lot of uploads, S3 might be bottleneck. In this case, I suggest to setup NAS. Personally, I use GlusterFS because it scales well and very easy to set up. It has replication issues, but you might not use replicated volumes at all and you are fine.
Another alternatives are Ceph, Sector/Sphere, XtreemFS, Tahoe-LAFS, POHMELFS and many others...

You can directly upload a file from a client to S3 with some newer technologies as detailed in this post:
http://www.ioncannon.net/programming/1539/direct-browser-uploading-amazon-s3-cors-fileapi-xhr2-and-signed-puts/
Otherwise, I personally would suggest using each server's tmp folder for exactly that-- temporary storage. After the file is on your server, you can always upload to S3, which would then be accessible across all of your load balanced servers.

Amazon S3 Cloud architecture

I'm writing an app that let's you share files in the cloud.
You make a user-account and can upload your files with and send links to friends.
I'm using Amazon S3 to save the data.
But I'm not sure how I should proceed.
There are buckets, which you can create in S3, and in those buckets you save your files.
I thought about making a bucket for each user, but then I read that you can only have 100 buckets at a time.
Isn't there a better way to managing this then to just save all user files in one "directory".
This will get so messy. I have never used S3 before, I would be very thankful for any advice.
And if this is the only way, what naming convention proved to be the best?
Thanks!

Even though S3 has a flat structure within a bucket, each object has its own path much like the directories you're used to.
So you can structure your paths like so:
/<user-id>/<album-1>/...
One thing to keep in mind is that not all directory related features are available, such as:
Deny access to /<user-123>/*,
Copy from one directory to another.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.