AWS S3 EC2 Dilemma. - php

Recently my Web design firm got a big contract to build a website that will be media rich and needs to run on wordpress. The client wants it so because of the simplicity and familiarity with Wordpress they have.
The hosting will be undoubtedly with AWS EC2, and we are not torn into hosting the actual files on a separate instance or a S3 bucket. I have never worked with S3, but have some 2+ years experience with EC2. Users uploading images, videos, documents,...will be a big component of the website.
ANTICIPATED: Based on the market stydy done by another firm for the client, we expect in the upwards of 1000 unique visitors daily, of whom 5-10% would be uploading on the server/bucket.
AIM: A fast website with that kind of media richess
Any advice as to the choice of the server / infrastructure settings/choices?

Wordpress by default does however store all of its files in the local file system. You can get plugins to allow uploads to be stored in S3. Although with only 1000 uniques, it may not be necessary.
The biggest gain in speed is going to be with using caching systems (preferably caching to memory).

There are many options open to you, but with something like 1000 unique per day, you don't have much to worry about. If you want to take advantage of the CDN part of S3 then:-
Create a bucket in S3, with public CDN options enabled
Mount this bucket using S3 FUSE in Linux -> Guide here
http://juliensimon.blogspot.de/2013/08/howto-aws-mount-s3-buckets-from-linux.html
Ensure memory caching is enabled in Wordpress (W3 Cache)
Minify the CSS and JS files using W3 Cache (careful as this sometimes breaks the themes)
If site is not responsive enough consider using AWS CloudFront or CloudFlare
If the site much be online at all times then consider 2 instances with DNS roundrobin. Keep wordpress Sync'd using rsync. Ensure they both mount the same S3 bucket.
This should be more than enough.
Hope this helps.

1000 visitors a day is not really so large a strain on a server that I'd be too especially worried about it. If it were me, I'd make sure to use caching (like datasage recommended), and also looking into leveraging a CDN, especially since you're dealing with a lot of media. No matter what CDN you use, be it Cloudflare, MaxCDN, VideoPress, Amazon CloudFront, Akamai, or any one of many great content delivery network providers out there, I think you'll get a lot further with that than you will tweaking your server. If you want to do that too, I'd suggest caching and NGINX. Obviously minify CSS and JS too, before you deploy, but that's kinda obvious

I appreciate all the input as the concensus is that 1000 uniques / day is not much of a deal. Check
Need I to mention though that we'll build every single functionality ourselves as a main plugin to fully have control over the features. Many themes nowadays are stuffed with unnecessary junk which doesn't help the case we're trying to make. I will look at StuartB's solution more closely, but I certainly do appreciate all of your inputs.

Related

What are the pros and cons to using AWS/S3 for static content?

I want some little guidance from you all. I have a multimedia based site which is hosted on a traditional Linux based, LAMP hosting. As the site has maximum of Images /Video content,there are around 30000+ posts and database size is around 20-25MB but the file system usage is of 10GB and Bandwidth of around 800-900 GB ( of allowed 1 TB ) is getting utilized every month.
Now,after a little brainstorming and seeing my alternatives here and there, I have come up with two options
Increase / Get a bigger hosting plan.
Get my static content stored on Amazon S3.
While the first plan will be a simple option, I am actually looking forward for the second one, i.e. storing my static content on Amazon S3. The website i have is totally custom-coded and based on PHP+MySQL. I went through this http://undesigned.org.za/2007/10/22/amazon-s3-php-class/ and it gave me a fair idea.
I would love to know pros/cons when I consider hosting static content on s3.
Please give your inputs.
Increase / Get a bigger hosting plan.
I would not do that. The reason is, storage is cheap, while the other components of a "bigger hosting plan" will cost you dearly without providing an immediate benefit (more memory is expensive if you don't need it)
Get my static content stored on Amazon S3.
This is the way to go. S3 is very inexpensive, it is a no-brainer. Having said that, since we are talking video here, I would recommend a third option:
[3.] Store video on AWS S3 and serve through CloudFront. It is still rather inexpensive by comparison, given the spectacular bandwidth and global distribution. CloudFront is Amazon's CDN for blazing fast speeds to any location.
If you want to save on bandwidth, you may also consider using Amazon Elastic Transcoder for high-quality compression (to minimize your bandwidth usage).
Traditional hosting is way too expensive for this.
Bigger Hosting Plan
going for a bigger hosting plan is not a permanent solution because
As the static content images/videos always grow in size. this time your need is 1 TB the next time it will increase more. So, you will be again in the same situation.
With the growth of users and static content your bandwidth will also increase and will cost you more.
Your database size is not so big and We can assume you are not using a lot of CPU power and memory. So you will only be using more disk space and paying for larger CPU and memory which you are not using.
Technically it is not good to server all your requests from a single server. Browser has a limited simultaneous requests per domain.
S3/ Cloud storage for static content
s3 or other cloud storage is good option for static contents. following are the benefits.
You don't need to worry about the storage space it auto scales and available in abundance.
If your site is accessible in different location worldwide you can manage a cdn to improve the speed of content delivered from the nearest location.
The bandwidth is very cheap as compared to the traditional hosting.
It will also decrease burden from your server by uploading your files and serving from s3.
These are some of the benefits for using s3 over traditional hosting. As s3 is specifically built to server the static contents. Decision is yours :)
If you're looking at the long term, at some point you might not be able to afford a server that will hold all of your data. I think S3 is a good option for a case like yours for the following reasons:
You don't need to worry about large file uploads tying down your server. With Cross Origin Resource Sharing, you can upload files directly from the client to your S3 bucket.
Modern browsers will often load parallel requests when a webpage requests content from different domains. If you have your pictures coming from yourbucket.s3.amazonaws.com and the rest of your website loading from yourdomain.com, your users might experience a shorter load time since these requests will be run in parallel.
At some point, you might want to use a Content Distribution Network (CDN) to serve your media. When this happens, you could use Amazon's cloudfront with out of the box support for S3, or you can use another CDN - most popular CDNs these days do support serving content from S3 buckets.
It's a problem you'll never have to worry about. Amazon takes care of redundancy, availability, backups, failovers, etc. That's a big load off your shoulders leaving you with other things to take care of knowing your media is being stored in a way that's scalable and future-proof (at least the foreseeable future).

Managing images and other media for large website

I am currently developing a big web application. Users will be able to post images as well as music files to my server. I am using PHP with Codeigniter framework and work off an Apache server from A2hosting.com . I was wondering how I will be able to manage space. I know that they offer unlimited storage but I know that I am going to run into issues if too many people are uploading too much.
How is the best way to deal with this? Would you have your own separate hosting plan for storing all this media? Could it be stored in a through a third party? Will my site eventually be slowing down because there is way too much memory that I am holding for people?
I guess I would kind of like to know what issues I am going to be running into? My project is almost completed and I want to avoid any large scale errors that may occur. I am the only one working on this project so man power is pretty precious, as well as time.
Any help and insights will be greatly appreciated.
Anyone offering you "unlimited storage" at a fixed rate is having you on.
We put our media files on Amazon S3, which is designed to handle trillions of files.
If you do host the uploaded data locally please don't place your uploads folder anywhere in your web root or in a place directly accessible by remote users. Expanding your storage is easy but recovering from a total website or server compromise is not!

Input on decision: file hosting with amazon s3 or similar and php

I appreciate your comments to help me decide on the following.
My requirements:
I have a site hosted on a shared server and I'm going to provide content to my users. About 60 GB of content (about 2000 files 30mb each. Users will have access to only 20 files at a time), I calculate about 100 GB monthly bandwidth usage.
Once a user registers for the content, links will be accessible for the user to download. But I want the links to expire in 7 days, with the posibility to increase the expiration time.
I think that the disk space and bandwidth calls for a service like Amazon S3 or Rackspace Cloud files (or is there an alternative? )
To manage the expiration I plan to somehow obtain links that expire (I think S3 has that feature, not Rackspace) OR control the expiration date on my database and have a batch process that will rename on a daily basis all 200 files on the cloud and on my database (in case a user copied the direct link, it won't work the next day, only my webpage will have the updated links). PHP is used for programming.
So what do you think? Cloud file hosting is the way to go? Which one? Does managing the links makes sense that way or it is too difficult to do that through programming (send commands to the cloud server...)
EDIT:
Some host companies have Unlimited space and Bandwidth on their shared plans.. I asked their support staff and they said that they really honor the "unlimited" deal. So 100 GB of transfer a month is ok, the only thing to look out is CPU usage. So going shared hosting is one more alternative to choose from..
FOLLOWUP:
So digging more into this I found that the TOS of the Unlimited plans say that it is not permitted to use the space primarily to host multimedia files. So I decided to go with Amazon s3 and the solution provided by Tom Andersen.
Thanks for the input.
I personally don't think you necessarily need to go to a cloud based solution for this. It may be a little costly. You could simply get a dedicated server instead. One provider that comes to mind gives 3,000 GB/month of bandwidth on some of their lowest level plans. That is on a 10Mbit uplink; you can upgrade to 100Mbps for $10/mo of 1Gbit for $20/mo. I won't mention any names, but you can search for dedicated servers and possibly find one to your liking.
As for expiring the files, just implement that in PHP backed by a database. You won't have to move files around, store all the files in a directory not accessible from the web, and use a PHP script to determine if the link is valid, and if so read the contents of the file and pass them through to the browser. If the link is invalid, you can show an error message instead. It's a pretty simple concept and I think there are a lot of pre-written scripts that do that available, but depending on your needs, it isn't too difficult to do it yourself.
Cloud hosting has advantages, but right now I think its costly and if you aren't trying to spread the load geographically or plan on supporting thousands of simultaneous users and need the elasticity of the cloud, you could possibly use a dedicated server instead.
Hope that helps.
I can't speak for S3 but I use Rackspace Cloud files and servers.
It's good in that you don't pay for incoming bandwidth, so uploads are super cheap.
I would do it like this:
Upload all the files you need to a 'private' container
Create a public container with CDN enabled
That'll give you a special url like http://c3214146.r65.ce3.rackcdn.com
Make your own CNAME DNS record for your domain point to that, like: http://cdn.yourdomain.com
When a user requests a file, use the COPY api operation with a long random filename to do a server side copy from the private container to the public container.
Store the filename in a mysql DB for your app
Once the file expires, use the DELETE api operation, then the PURGE api operation to get it out of the CDN .. finally delete the record from the mysql table.
With the PURGE command .. I heard it doesn't work 100% of the time and it may leave the file around for an extra day .. also in the docs it says to reserve it's use for only emergency things.
Edit: I just heard, there's a 25 purge per day limit.
However personally I've just used delete on objects and found that took it out the CDN straight away. In summary, the worst case would be that the file would still be accessible on some CDN nodes for 24 hours after deletion.
Edit: You can change the TTL (caching time) on the CDN nodes .. default is 72 hours so might pay to set it to something lower .. but not so low that you loose the advantage of CDN.
The advantages I find with the CDN are:
It pushes content right out to end users far away from the USA servers and gives super fast download times for them
If you have a super popular file .. it won't take out your site when 1000 people start trying to download it .. as they'd all get copies pushed out the whatever CDN node they were closest to.
You don't have to rename the files on S3 every day. Just make them private (which is default), and hand out time limited urls for day or a week to anyone who is authorized.
I would consider making the links only good for 20 mins, so that a user has to re-login in order to re-download the files. Then they can't even share the links they get from you.

Image Storage and CDN for websites

Currently I am looking to move my websites images to a storage service. I have two websites developed in PHP and ASP.NET.
Using Amazon S3 service we can host all our images and videos to serve web pages. But there are some limitations using S3 service when we want to serve images.
If website needs different thumbnail images with different sizes from original image, it is tough. We have again need to subscribe for EC2 also. Though the data transfer from S3 to EC2 is free, it takes time for data transfer before processing image resize operation.
Uploading number of files in zip format and unzipping in S3 is not possible to reduce number of uploads.
Downloading multiple files from S3 is not possible in case if we want to shift to another provider.
Image names are case sensitive in S3. Which will not load images if image name does not match with request.
Among all these first one is very important thing since image resize is general requirement.
Which provider is best suitable to achieve my goal. Can I move to Google AppEngine only for the purpose of image hosting or is there any other vendor who can provide above services?
I've stumbled upon a nice company called Cloudinary that provides CDN image storage service - they also provide a variety of ways that allow on the fly image manipulation (Cropping will mainly concern you as you we're talking about different sized thumbnails).
I'm not sure how they compete with other companies like maxcdn in site speed enhancement but from what I can see - they have more options when it come to image manipulation.
S3 is really slow and also not distributed. Cloudfront in comparison is also one of the slowest and most expensive CDNs you can get. The only advantage is that if you're using other AWS already you'll get one bill.
I blogged about different CDNs and ran some tests:
http://till.klampaeckel.de/blog/archives/100-Shopping-for-a-CDN.html
As for the setup, I'd suggest something that uses origin-pull. So you host the images yourself and the CDN requests a copy of it the first time it's requested.
This would also mean you can use a script to "dynamically" generate the images because they'll be pulled only once or so. Just have to set appropriate cache headers. The images would then be cached until you purge the CDN's cache.
HTH
I've just come across CloudFlare - from what I understand from their site, you shouldn't need to make any changes to your website. Apparently all you need to do is change your DNS settings. Even provides a free option.
If you're using EC2, then S3 is your best option. The "best practice" is to simply pre-render the image in all sizes and upload with different names. I.e.:
/images/image_a123.large.jpg
/images/image_a123.med.jpg
/images/image_a123.thumb.jpg
This practice is in use by Digg, Twitter (once upon a time, maybe not with twimg...), and a host of other companies.
It may not be ideal, but it's the fastest and most simple way to do it. In terms of switching to another provider, you'll likely not do that because of the amount of work to transfer all of the files anyway. If you've got 1,000,000 images or 3,000,000 images, you've still got many megabytes of files.
Fortunately, S3 has an import/export service. You can send them an empty hard drive and they'll format it and download your data to it for a small fee.
In terms of your concern about case sensitivity, you won't find a provider that doesn't have case sensitivity. If your code is written properly, you'll normalize all names to uppercase or lowercase, or use some sort of base 64 ID system that takes care of case for you.
All in all, S3 is going to give you the best "bang for your buck", and it has CloudFront support if you want to speed it up. Not using S3 because of reasons 3 and 4 is nonsense, as they'll likely apply anywhere you go.

cloud drive vs. cloud files (or should we not bother?)

The web application is in the process of moving from a standalone server to a pair of servers behind a load-balancer, and contains a 50GB directory of user-created data that is growing rapidly. On rackspace, the only way to add disk space dynamically is by also doubling RAM and monthly cost, which isn't necessary. So, to cloud files it is (unless anyone has another solution in mind?). Using JungleDisk, I can move the files to a cloud files container, and can mount the cloud container on both the servers, and create a symbolic link from the directories where the content was to the mounted drive. This would require no code modification. Alternatively, I could interface directly with cloud files using their PHP API, but this would require massive code changes (all the paths? really?). Is there any inherent problem with taking the easy way out in this case? I set up a model and it seems to work well, but I usually seem to be missing something.
Thanks,
Brandon
I think mounting the drive makes a lot of sense for your scenario, but to be honest I haven't tried it with any load. The good news is that you could always try the easy approach and then refactor if it doesn't perform under load. I'd hope Rackspace accounted and tested for this exact scenario, it seems logical to me.
For some extraneous information, we faced the same question here and did a cost comparison of using Cloud Site vs Cloud Files. We had to factor in both bandwidth and amount of storage into the costs because communication between Sites/Servers and Cloud Files still incurs bandwidth charges. In other words, do you have a lot of files that sit around, or do you have a few files that get accessed often.
We spend a lot of time talking with RackSpace support about performance and scalability differences between Cloud Sites and Cloud Files - I'd recommend giving them a call. We ultimately chose to just use Sites because of our needs, the cost difference was pretty insignificant as it scaled. Also because the Cloud Files API didn't have the granular security that we needed, so we would have to have written a gateway service anyway.

Categories