cloud drive vs. cloud files (or should we not bother?) - php

The web application is in the process of moving from a standalone server to a pair of servers behind a load-balancer, and contains a 50GB directory of user-created data that is growing rapidly. On rackspace, the only way to add disk space dynamically is by also doubling RAM and monthly cost, which isn't necessary. So, to cloud files it is (unless anyone has another solution in mind?). Using JungleDisk, I can move the files to a cloud files container, and can mount the cloud container on both the servers, and create a symbolic link from the directories where the content was to the mounted drive. This would require no code modification. Alternatively, I could interface directly with cloud files using their PHP API, but this would require massive code changes (all the paths? really?). Is there any inherent problem with taking the easy way out in this case? I set up a model and it seems to work well, but I usually seem to be missing something.
Thanks,
Brandon

I think mounting the drive makes a lot of sense for your scenario, but to be honest I haven't tried it with any load. The good news is that you could always try the easy approach and then refactor if it doesn't perform under load. I'd hope Rackspace accounted and tested for this exact scenario, it seems logical to me.
For some extraneous information, we faced the same question here and did a cost comparison of using Cloud Site vs Cloud Files. We had to factor in both bandwidth and amount of storage into the costs because communication between Sites/Servers and Cloud Files still incurs bandwidth charges. In other words, do you have a lot of files that sit around, or do you have a few files that get accessed often.
We spend a lot of time talking with RackSpace support about performance and scalability differences between Cloud Sites and Cloud Files - I'd recommend giving them a call. We ultimately chose to just use Sites because of our needs, the cost difference was pretty insignificant as it scaled. Also because the Cloud Files API didn't have the granular security that we needed, so we would have to have written a gateway service anyway.

Related

What are the pros and cons to using AWS/S3 for static content?

I want some little guidance from you all. I have a multimedia based site which is hosted on a traditional Linux based, LAMP hosting. As the site has maximum of Images /Video content,there are around 30000+ posts and database size is around 20-25MB but the file system usage is of 10GB and Bandwidth of around 800-900 GB ( of allowed 1 TB ) is getting utilized every month.
Now,after a little brainstorming and seeing my alternatives here and there, I have come up with two options
Increase / Get a bigger hosting plan.
Get my static content stored on Amazon S3.
While the first plan will be a simple option, I am actually looking forward for the second one, i.e. storing my static content on Amazon S3. The website i have is totally custom-coded and based on PHP+MySQL. I went through this http://undesigned.org.za/2007/10/22/amazon-s3-php-class/ and it gave me a fair idea.
I would love to know pros/cons when I consider hosting static content on s3.
Please give your inputs.
Increase / Get a bigger hosting plan.
I would not do that. The reason is, storage is cheap, while the other components of a "bigger hosting plan" will cost you dearly without providing an immediate benefit (more memory is expensive if you don't need it)
Get my static content stored on Amazon S3.
This is the way to go. S3 is very inexpensive, it is a no-brainer. Having said that, since we are talking video here, I would recommend a third option:
[3.] Store video on AWS S3 and serve through CloudFront. It is still rather inexpensive by comparison, given the spectacular bandwidth and global distribution. CloudFront is Amazon's CDN for blazing fast speeds to any location.
If you want to save on bandwidth, you may also consider using Amazon Elastic Transcoder for high-quality compression (to minimize your bandwidth usage).
Traditional hosting is way too expensive for this.
Bigger Hosting Plan
going for a bigger hosting plan is not a permanent solution because
As the static content images/videos always grow in size. this time your need is 1 TB the next time it will increase more. So, you will be again in the same situation.
With the growth of users and static content your bandwidth will also increase and will cost you more.
Your database size is not so big and We can assume you are not using a lot of CPU power and memory. So you will only be using more disk space and paying for larger CPU and memory which you are not using.
Technically it is not good to server all your requests from a single server. Browser has a limited simultaneous requests per domain.
S3/ Cloud storage for static content
s3 or other cloud storage is good option for static contents. following are the benefits.
You don't need to worry about the storage space it auto scales and available in abundance.
If your site is accessible in different location worldwide you can manage a cdn to improve the speed of content delivered from the nearest location.
The bandwidth is very cheap as compared to the traditional hosting.
It will also decrease burden from your server by uploading your files and serving from s3.
These are some of the benefits for using s3 over traditional hosting. As s3 is specifically built to server the static contents. Decision is yours :)
If you're looking at the long term, at some point you might not be able to afford a server that will hold all of your data. I think S3 is a good option for a case like yours for the following reasons:
You don't need to worry about large file uploads tying down your server. With Cross Origin Resource Sharing, you can upload files directly from the client to your S3 bucket.
Modern browsers will often load parallel requests when a webpage requests content from different domains. If you have your pictures coming from yourbucket.s3.amazonaws.com and the rest of your website loading from yourdomain.com, your users might experience a shorter load time since these requests will be run in parallel.
At some point, you might want to use a Content Distribution Network (CDN) to serve your media. When this happens, you could use Amazon's cloudfront with out of the box support for S3, or you can use another CDN - most popular CDNs these days do support serving content from S3 buckets.
It's a problem you'll never have to worry about. Amazon takes care of redundancy, availability, backups, failovers, etc. That's a big load off your shoulders leaving you with other things to take care of knowing your media is being stored in a way that's scalable and future-proof (at least the foreseeable future).

AWS S3 EC2 Dilemma.

Recently my Web design firm got a big contract to build a website that will be media rich and needs to run on wordpress. The client wants it so because of the simplicity and familiarity with Wordpress they have.
The hosting will be undoubtedly with AWS EC2, and we are not torn into hosting the actual files on a separate instance or a S3 bucket. I have never worked with S3, but have some 2+ years experience with EC2. Users uploading images, videos, documents,...will be a big component of the website.
ANTICIPATED: Based on the market stydy done by another firm for the client, we expect in the upwards of 1000 unique visitors daily, of whom 5-10% would be uploading on the server/bucket.
AIM: A fast website with that kind of media richess
Any advice as to the choice of the server / infrastructure settings/choices?
Wordpress by default does however store all of its files in the local file system. You can get plugins to allow uploads to be stored in S3. Although with only 1000 uniques, it may not be necessary.
The biggest gain in speed is going to be with using caching systems (preferably caching to memory).
There are many options open to you, but with something like 1000 unique per day, you don't have much to worry about. If you want to take advantage of the CDN part of S3 then:-
Create a bucket in S3, with public CDN options enabled
Mount this bucket using S3 FUSE in Linux -> Guide here
http://juliensimon.blogspot.de/2013/08/howto-aws-mount-s3-buckets-from-linux.html
Ensure memory caching is enabled in Wordpress (W3 Cache)
Minify the CSS and JS files using W3 Cache (careful as this sometimes breaks the themes)
If site is not responsive enough consider using AWS CloudFront or CloudFlare
If the site much be online at all times then consider 2 instances with DNS roundrobin. Keep wordpress Sync'd using rsync. Ensure they both mount the same S3 bucket.
This should be more than enough.
Hope this helps.
1000 visitors a day is not really so large a strain on a server that I'd be too especially worried about it. If it were me, I'd make sure to use caching (like datasage recommended), and also looking into leveraging a CDN, especially since you're dealing with a lot of media. No matter what CDN you use, be it Cloudflare, MaxCDN, VideoPress, Amazon CloudFront, Akamai, or any one of many great content delivery network providers out there, I think you'll get a lot further with that than you will tweaking your server. If you want to do that too, I'd suggest caching and NGINX. Obviously minify CSS and JS too, before you deploy, but that's kinda obvious
I appreciate all the input as the concensus is that 1000 uniques / day is not much of a deal. Check
Need I to mention though that we'll build every single functionality ourselves as a main plugin to fully have control over the features. Many themes nowadays are stuffed with unnecessary junk which doesn't help the case we're trying to make. I will look at StuartB's solution more closely, but I certainly do appreciate all of your inputs.

images/videos/mp3s on Network file system using php

I did some Google searches and can't seem to find what i want. I'm designing my web site to use MYSQL, PHP Web Servers. multiple web servers with load balancers and a MySql Custer for scaling is planed so far. But then i get to images/videos/mp3s. I need a file system multiple servers can read files from and write files to. So one web server can run the MySQL, Networked File System and Web Server, but as the site scales the site can be switched to multiple servers. Does anyone have any examples, tutorials or resources to help me on this? The site runs on Ubuntu Servers. My original idea was to just store the images in MySQL(I know how to do that and have working examples) so all servers could read/write but other people told me thats a bad idea and i should use a file system(but don't want to use the local one, as i don't think it san scale for large sites).
There are Three systems that come to mind - Mogilefs, Mongodb GridFS and a cloud based storage solution.
MogileFS (OMG Files!) was developed for Livejournal and stores metadata in Mysql. It uses that to find the actual disk with the appropriate file and streams it out.
MongoDB GridFS is a lot newer, and probably easier to get going, certainly for a smaller system. It uses a new 'NoSql' database to store parts of files across its database, assembling as required. Searching around for information will find plenty of information.
Finally, you could simply avoid the whole issue and just upload images into Amazon's S3, or Rackspace Cloudfiles. I've done the latter before (though the site was already running inside Rackspace's system) and it's not very difficult, again with plenty of examples around.
For S3 there is also a command-line tool, s3cmd that can be set to sync (or, better) upload and then delete a directory full of files into an S3 'bucket'.
First storing images/large files is not really possible with MySQL because of the maximum size limitation
To quote this answer Choosing data type for MySQL?
MySQL is incapable of working with any data that is larger than max_allowed_packet (default: 1M) in size, unless you construct complicated and memory intense workarounds at the server side. This further restricts what can be done with TEXT/BLOB-like types, and generally makes the LARGETEXT/LARGEBLOB type useless in a default configuration.
Now for storage and upgrade compatibility why not just store them on an NAS or Raid system that you can continue to tack drives onto. Then in your DB just store a path to the file. Much lest db intensive and allows for decent scalability.

CakePHP High-Availability Server Farm setup

I am currently working on configuring my CakePHP (1.3) based web app to run in a HA Setup. I have 4 web boxes running the app itself a MySQL cluster for database backend. I have users uploading 12,000 - 24,000 images a week (35-70 GB). The app then generates 2 additional files from the original, a thumbnail and a medium size image for preview. This means a total of 36,000 - 72,000 possible files added to the repositories each week.
What I am trying to wrap my head around is how to handle large numbers of static file request coming from users trying to view these images. I mean I can have have multiple web boxes serving only static files with a load-balancer dispatching the requests.
But does anyone on here have any ideas on how to keep all static file servers in sync?
If any of you have any experiences you would like to share, or any useful links for me, it would be very appreciated.
Thanks,
serialk
It's quite a thorny problem.
Technically you can get a high-availability shared directory through something like NFS (or SMB if you like), using DRBD and Linux-HA for an active/passive setup. Such a setup will have good availability against single server loss, however, such a setup is quite wasteful and not easy to scale - you'd have to have the app itself decide which server(s) to go to, configure NFS mounts etc, and it all gets rather complicated.
So I'd probably prompt for avoiding keeping the images in a filesystem at all - or at least, not the conventional kind. I am assuming that you need this to be flexible to add more storage in the future - if you can keep the storage and IO requirement constant, DRBD, HA NFS is probably a good system.
For storing files in a flexible "cloud", either
Tahoe LAFS
Or perhaps, at a push, Cassandra, which would require a bit more integration but maybe better in some ways.
MySQL-cluster is not great for big blobs as it (mostly) keeps the data in ram; also the high consistency it provides requires a lot of locking which makes updates scale (relatively) badly at high workloads.
But you could still consider putting the images in mysql-cluster anyway, particularly as you have already set it up - it would require no more operational overhead.

What's the best way to manage multiple media servers, and file allocations between them?

I have a file host website thats burning through 2gbit of bandwidth, so I need to start adding secondary media servers to store the files. What would be the best way to manage a multiple server setup, with a large amount of files? Preferably through php only.
Currently, I only have around 100Gb of files... so I could get a 2nd server, mirror all content between them, and then round robin the traffic 50/50, 33/33/33, etc. But once the total amount of files grows beyond the capacity of a single server, this wont work.
The idea that I had was to have a list of media servers stored in the DB with the amounts of free space left on each server. Once a file is uploaded, php will choose to which server the file is actually uploaded to, and spread out all the files evenly among the servers.
Was hoping to get some more input/inspiration.
Cant use any 3rd party services like Amazon. The files range from several bytes to a gigabyte.
Thanks
You could try MogileFS. It is a distributed file system. Has a good API for PHP. You can create categories and upload a file to that category. For each category you can define on how many servers it should be distributed. You can use the API to get a URL to that file on a random node.
If you are doing as much data transfer as you say, it would seem whatever it is you are doing is growing quite rapidly.
It might be worth your while to contact your hosting provider and see if they offer any sort of shared storage solutions via iscsi, nas, or other means. Ideally the storage would not only start out large enough to store everything you have on it, but it would also be able to dynamically grow beyond your needs. I know my hosting provider offers a solution like this.
If they do not, you might consider colocating your servers somewhere that either does offer a service like that, or would allow you install your own storage server (which could be built cheaply from off the shelf components and software like Freenas or Openfiler).
Once you have a centralized storage platform, you could then add web-servers to your hearts content and load balance them based on load, all while accessing the same central storage repository.
Not only is this the correct way to do it, it would offer you much more redundancy and expandability in the future if you endeavor continues to grow at the pace it is currently growing.
The other solutions offered using a database repository of what is stored where, would work, but it not only adds an extra layer of complexity into the fold, but an extra layer of processing between your visitors and the data they wish to access.
What if you lost a hard disk, do you lose 1/3 or 1/2 of all your data?
Should the heavy IO's of static content be on the same spindles as the rest of your operating system and application data?
Your best bet is really to get your files into some sort of storage that scales. Storing files locally should only be done with good reason (they are sensitive, private, etc.)
Your best bet is to move your content into the cloud. Mosso's CloudFiles or Amazon's S3 will both allow you to store an almost infinite amount of files. All your content is then accessible through an API. If you want, you can then use MySQL to track meta-data for easy searching, and let the service handle the actual storage of the files.
i think your own idea is not the worst one. get a bunch of servers, and for every file store which server(s) it's on. if new files are uploaded, use most-free-space first*. every server handles it's own delivery (instead of piping through the main server).
pros:
use multiple servers for a single file. e.g. for cutekitten.jpg: filepath="server1\cutekitten.jpg;server2\cutekitten.jpg", and then choose the server depending on the server load (or randomly, or alternating, ...)
if you're careful you may be able to move around files automatically depending on the current load. so if your cute-kitten image gets reddited/slashdotted hard, move it to the server with the lowest load and update the entry.
you could do this with a cron-job. just log the downloads for the last xx minutes. try some formular like (downloads-per-minutefilesize(product of serverloads)) for weighting. pick tresholds for increasing/decreasing the number of servers those files are distributed to.
if you add a new server, it's relativley painless (just add the address to the server pool)
cons:
homebrew solutions are always risky
your load distribution algorithm must be well tested, otherwise bad things could happen (everything mirrored everywhere)
constantly moving files around for balancing adds additional server load
* or use a mixed weighting algorithm: free-space, server-load, file-popularity
disclaimer: never been in the situation myself, just guessing.
Consider HDFS, which is part of Apache's Hadoop. This will integrate with PHP, but you'll be setting up a second application. This will also solve all your points of balancing among servers and handling things when your file space usage exceeds one server's ability. It's not purely in PHP, though, but I don't think that's what you meant when you said "pure" anyway.
See http://hadoop.apache.org/core/docs/current/hdfs_design.html for the idea of it. They cover the whole idea of how it handles large files, many files, replication, etc.

Categories