Image Storage and CDN for websites - php

Currently I am looking to move my websites images to a storage service. I have two websites developed in PHP and ASP.NET.
Using Amazon S3 service we can host all our images and videos to serve web pages. But there are some limitations using S3 service when we want to serve images.
If website needs different thumbnail images with different sizes from original image, it is tough. We have again need to subscribe for EC2 also. Though the data transfer from S3 to EC2 is free, it takes time for data transfer before processing image resize operation.
Uploading number of files in zip format and unzipping in S3 is not possible to reduce number of uploads.
Downloading multiple files from S3 is not possible in case if we want to shift to another provider.
Image names are case sensitive in S3. Which will not load images if image name does not match with request.
Among all these first one is very important thing since image resize is general requirement.
Which provider is best suitable to achieve my goal. Can I move to Google AppEngine only for the purpose of image hosting or is there any other vendor who can provide above services?

I've stumbled upon a nice company called Cloudinary that provides CDN image storage service - they also provide a variety of ways that allow on the fly image manipulation (Cropping will mainly concern you as you we're talking about different sized thumbnails).
I'm not sure how they compete with other companies like maxcdn in site speed enhancement but from what I can see - they have more options when it come to image manipulation.

S3 is really slow and also not distributed. Cloudfront in comparison is also one of the slowest and most expensive CDNs you can get. The only advantage is that if you're using other AWS already you'll get one bill.
I blogged about different CDNs and ran some tests:
http://till.klampaeckel.de/blog/archives/100-Shopping-for-a-CDN.html
As for the setup, I'd suggest something that uses origin-pull. So you host the images yourself and the CDN requests a copy of it the first time it's requested.
This would also mean you can use a script to "dynamically" generate the images because they'll be pulled only once or so. Just have to set appropriate cache headers. The images would then be cached until you purge the CDN's cache.
HTH

I've just come across CloudFlare - from what I understand from their site, you shouldn't need to make any changes to your website. Apparently all you need to do is change your DNS settings. Even provides a free option.

If you're using EC2, then S3 is your best option. The "best practice" is to simply pre-render the image in all sizes and upload with different names. I.e.:
/images/image_a123.large.jpg
/images/image_a123.med.jpg
/images/image_a123.thumb.jpg
This practice is in use by Digg, Twitter (once upon a time, maybe not with twimg...), and a host of other companies.
It may not be ideal, but it's the fastest and most simple way to do it. In terms of switching to another provider, you'll likely not do that because of the amount of work to transfer all of the files anyway. If you've got 1,000,000 images or 3,000,000 images, you've still got many megabytes of files.
Fortunately, S3 has an import/export service. You can send them an empty hard drive and they'll format it and download your data to it for a small fee.
In terms of your concern about case sensitivity, you won't find a provider that doesn't have case sensitivity. If your code is written properly, you'll normalize all names to uppercase or lowercase, or use some sort of base 64 ID system that takes care of case for you.
All in all, S3 is going to give you the best "bang for your buck", and it has CloudFront support if you want to speed it up. Not using S3 because of reasons 3 and 4 is nonsense, as they'll likely apply anywhere you go.

Related

What are the pros and cons to using AWS/S3 for static content?

I want some little guidance from you all. I have a multimedia based site which is hosted on a traditional Linux based, LAMP hosting. As the site has maximum of Images /Video content,there are around 30000+ posts and database size is around 20-25MB but the file system usage is of 10GB and Bandwidth of around 800-900 GB ( of allowed 1 TB ) is getting utilized every month.
Now,after a little brainstorming and seeing my alternatives here and there, I have come up with two options
Increase / Get a bigger hosting plan.
Get my static content stored on Amazon S3.
While the first plan will be a simple option, I am actually looking forward for the second one, i.e. storing my static content on Amazon S3. The website i have is totally custom-coded and based on PHP+MySQL. I went through this http://undesigned.org.za/2007/10/22/amazon-s3-php-class/ and it gave me a fair idea.
I would love to know pros/cons when I consider hosting static content on s3.
Please give your inputs.
Increase / Get a bigger hosting plan.
I would not do that. The reason is, storage is cheap, while the other components of a "bigger hosting plan" will cost you dearly without providing an immediate benefit (more memory is expensive if you don't need it)
Get my static content stored on Amazon S3.
This is the way to go. S3 is very inexpensive, it is a no-brainer. Having said that, since we are talking video here, I would recommend a third option:
[3.] Store video on AWS S3 and serve through CloudFront. It is still rather inexpensive by comparison, given the spectacular bandwidth and global distribution. CloudFront is Amazon's CDN for blazing fast speeds to any location.
If you want to save on bandwidth, you may also consider using Amazon Elastic Transcoder for high-quality compression (to minimize your bandwidth usage).
Traditional hosting is way too expensive for this.
Bigger Hosting Plan
going for a bigger hosting plan is not a permanent solution because
As the static content images/videos always grow in size. this time your need is 1 TB the next time it will increase more. So, you will be again in the same situation.
With the growth of users and static content your bandwidth will also increase and will cost you more.
Your database size is not so big and We can assume you are not using a lot of CPU power and memory. So you will only be using more disk space and paying for larger CPU and memory which you are not using.
Technically it is not good to server all your requests from a single server. Browser has a limited simultaneous requests per domain.
S3/ Cloud storage for static content
s3 or other cloud storage is good option for static contents. following are the benefits.
You don't need to worry about the storage space it auto scales and available in abundance.
If your site is accessible in different location worldwide you can manage a cdn to improve the speed of content delivered from the nearest location.
The bandwidth is very cheap as compared to the traditional hosting.
It will also decrease burden from your server by uploading your files and serving from s3.
These are some of the benefits for using s3 over traditional hosting. As s3 is specifically built to server the static contents. Decision is yours :)
If you're looking at the long term, at some point you might not be able to afford a server that will hold all of your data. I think S3 is a good option for a case like yours for the following reasons:
You don't need to worry about large file uploads tying down your server. With Cross Origin Resource Sharing, you can upload files directly from the client to your S3 bucket.
Modern browsers will often load parallel requests when a webpage requests content from different domains. If you have your pictures coming from yourbucket.s3.amazonaws.com and the rest of your website loading from yourdomain.com, your users might experience a shorter load time since these requests will be run in parallel.
At some point, you might want to use a Content Distribution Network (CDN) to serve your media. When this happens, you could use Amazon's cloudfront with out of the box support for S3, or you can use another CDN - most popular CDNs these days do support serving content from S3 buckets.
It's a problem you'll never have to worry about. Amazon takes care of redundancy, availability, backups, failovers, etc. That's a big load off your shoulders leaving you with other things to take care of knowing your media is being stored in a way that's scalable and future-proof (at least the foreseeable future).

AWS S3 EC2 Dilemma.

Recently my Web design firm got a big contract to build a website that will be media rich and needs to run on wordpress. The client wants it so because of the simplicity and familiarity with Wordpress they have.
The hosting will be undoubtedly with AWS EC2, and we are not torn into hosting the actual files on a separate instance or a S3 bucket. I have never worked with S3, but have some 2+ years experience with EC2. Users uploading images, videos, documents,...will be a big component of the website.
ANTICIPATED: Based on the market stydy done by another firm for the client, we expect in the upwards of 1000 unique visitors daily, of whom 5-10% would be uploading on the server/bucket.
AIM: A fast website with that kind of media richess
Any advice as to the choice of the server / infrastructure settings/choices?
Wordpress by default does however store all of its files in the local file system. You can get plugins to allow uploads to be stored in S3. Although with only 1000 uniques, it may not be necessary.
The biggest gain in speed is going to be with using caching systems (preferably caching to memory).
There are many options open to you, but with something like 1000 unique per day, you don't have much to worry about. If you want to take advantage of the CDN part of S3 then:-
Create a bucket in S3, with public CDN options enabled
Mount this bucket using S3 FUSE in Linux -> Guide here
http://juliensimon.blogspot.de/2013/08/howto-aws-mount-s3-buckets-from-linux.html
Ensure memory caching is enabled in Wordpress (W3 Cache)
Minify the CSS and JS files using W3 Cache (careful as this sometimes breaks the themes)
If site is not responsive enough consider using AWS CloudFront or CloudFlare
If the site much be online at all times then consider 2 instances with DNS roundrobin. Keep wordpress Sync'd using rsync. Ensure they both mount the same S3 bucket.
This should be more than enough.
Hope this helps.
1000 visitors a day is not really so large a strain on a server that I'd be too especially worried about it. If it were me, I'd make sure to use caching (like datasage recommended), and also looking into leveraging a CDN, especially since you're dealing with a lot of media. No matter what CDN you use, be it Cloudflare, MaxCDN, VideoPress, Amazon CloudFront, Akamai, or any one of many great content delivery network providers out there, I think you'll get a lot further with that than you will tweaking your server. If you want to do that too, I'd suggest caching and NGINX. Obviously minify CSS and JS too, before you deploy, but that's kinda obvious
I appreciate all the input as the concensus is that 1000 uniques / day is not much of a deal. Check
Need I to mention though that we'll build every single functionality ourselves as a main plugin to fully have control over the features. Many themes nowadays are stuffed with unnecessary junk which doesn't help the case we're trying to make. I will look at StuartB's solution more closely, but I certainly do appreciate all of your inputs.

Common practice to compress image before sending to mobile device?

My application requires downloading many images from server(each image about 10kb large). And I'm simply downloading each of them with independent AsyncTask without any optimization.
Now I'm wondering what's the common practice to transfer these images. For example, I'm thinking about saving zipped images at server, then send zipped file for user's mobile to unzip. In this case, is it better to combine the zip files into one big zip file for user to download?
Or there's better solution? Thanks in advance!
EDIT:
It seems combining zip files is a good idea, but I feel it may take too long for user to wait downloading and unzipping all images. So I may put ten or twenty images in each zip file, so user can see some downloaded ones while waiting for more to come. Having multiple AsyncTask fired together can be faster right? But they won't finish at the same time even given same file size and same address to download?
Since latency is often the largest problem with mobile connections, reducing the number of connections you have to open is a great way to optimize the loading times. Sending a zip file with all the images sounds like a very good idea, and is probably worth the time implementing.
Images probably are already compressed (gif, jpg, png). You will not reduce filesize but will reduce the number of connections. Which is a good idea for mobile. If it is always the same set of images you can use some sprite technology (sending one bigger image file containing all the images but with different x/y offset, in html you can use the backround with an offset to show the right image).
I was looking at the sidebar and saw this topic, but you're asking about patching when I saw the comments.
The best way to make sure is that the user knows what to do with it. You want the user to download X file and have Y output for a different purpose. On the other hand, it appears common practice is that chunks of resources for those not native to the Android app and not able to fit in the APK.
A comparable example is the JDIC apps, which use the popular Japanese resource that are in tandem used for English translations. JDIC apps like WWWJDIC use online downloads for the extremely large reference files that would otherwise have bad latency (which have been mentioned before) on Google servers. It's also bad rep to have >200 MB on Google apps unless it is 3D, which is justifiable. If your images cannot be compressed without extremely long loading times on the app itself, you may need to consider this option. The only downside is to request online connection (also mentioned before).
Also, you could use 7zip and program Android to self-extract it to a location. http://www.wikihow.com/Use-7Zip-to-Create-Self-Extracting-excutables
On another note, it would be optimal for the user to perform routine checks on the app while having a one-time download on initial startup. You can then optionally put in an AsyncTask so that your files will be downloaded to the app and used after restart or however you want it, so you really need only one AsyncTask. The benefit of this is that the user syncs on the apps and he may need to check only once. The downside is that the user may not always be able to update and may need to use 4G or LTE, but that is a minor concern if he can use WiFi whenever he wants.

Read image from folder using php or database

Is it better to read and list images directly from file system using simple php, or is it better to store image meta info and filename in the database and access the images by doing a mysql select. What are the pros and cons of both solutions.
Listing files on a file system is probably the easiest way to accomplish what you trying to do but it's going to be very slow if you are trying to cycle through several thousand directories/files on a networked file system (NFS, CIFS, GlusterFS, etc).
Storing files in a database will create a much more overhead since you are now involving an external application to store information. You have to remember that every time you are using a database you are also using network I/O, authentication mechanism, query parser, etc. At the same time all of this overhead might provide for a faster response then using a networked file system.
To conclude - everything depends on amount of files you are working with and underlying infrastructure. Two major things to look out for are going to be disk I/O and network I/O.
I would do the following:
Upload all the images in one directory
Store references to those images that are tied to the uploader's User ID
Then just select the image URLs that are tied to that ID, and output them however necessary.
People find it easier to store their files within folders and parse that folder with php. If you go the database method the database eventually gets larger and larger and larger.
I can see it becoming personal preference, but I personally have gone with parsing folders for images rather than storing it within a database.
Depends on the scale of what you are doing.
This is what I would be doing.
Store the file metadata in the database. You can store quite a bit of information about this image this way.
Store the image file on a distributed storage system like Amazon S3. Store the path in your metadata. Replication is part of the system. And it easily integrates with Cloudfront CDN.
Distribute the the images through Amazon Cloudfront CDN.

What's the best way to manage multiple media servers, and file allocations between them?

I have a file host website thats burning through 2gbit of bandwidth, so I need to start adding secondary media servers to store the files. What would be the best way to manage a multiple server setup, with a large amount of files? Preferably through php only.
Currently, I only have around 100Gb of files... so I could get a 2nd server, mirror all content between them, and then round robin the traffic 50/50, 33/33/33, etc. But once the total amount of files grows beyond the capacity of a single server, this wont work.
The idea that I had was to have a list of media servers stored in the DB with the amounts of free space left on each server. Once a file is uploaded, php will choose to which server the file is actually uploaded to, and spread out all the files evenly among the servers.
Was hoping to get some more input/inspiration.
Cant use any 3rd party services like Amazon. The files range from several bytes to a gigabyte.
Thanks
You could try MogileFS. It is a distributed file system. Has a good API for PHP. You can create categories and upload a file to that category. For each category you can define on how many servers it should be distributed. You can use the API to get a URL to that file on a random node.
If you are doing as much data transfer as you say, it would seem whatever it is you are doing is growing quite rapidly.
It might be worth your while to contact your hosting provider and see if they offer any sort of shared storage solutions via iscsi, nas, or other means. Ideally the storage would not only start out large enough to store everything you have on it, but it would also be able to dynamically grow beyond your needs. I know my hosting provider offers a solution like this.
If they do not, you might consider colocating your servers somewhere that either does offer a service like that, or would allow you install your own storage server (which could be built cheaply from off the shelf components and software like Freenas or Openfiler).
Once you have a centralized storage platform, you could then add web-servers to your hearts content and load balance them based on load, all while accessing the same central storage repository.
Not only is this the correct way to do it, it would offer you much more redundancy and expandability in the future if you endeavor continues to grow at the pace it is currently growing.
The other solutions offered using a database repository of what is stored where, would work, but it not only adds an extra layer of complexity into the fold, but an extra layer of processing between your visitors and the data they wish to access.
What if you lost a hard disk, do you lose 1/3 or 1/2 of all your data?
Should the heavy IO's of static content be on the same spindles as the rest of your operating system and application data?
Your best bet is really to get your files into some sort of storage that scales. Storing files locally should only be done with good reason (they are sensitive, private, etc.)
Your best bet is to move your content into the cloud. Mosso's CloudFiles or Amazon's S3 will both allow you to store an almost infinite amount of files. All your content is then accessible through an API. If you want, you can then use MySQL to track meta-data for easy searching, and let the service handle the actual storage of the files.
i think your own idea is not the worst one. get a bunch of servers, and for every file store which server(s) it's on. if new files are uploaded, use most-free-space first*. every server handles it's own delivery (instead of piping through the main server).
pros:
use multiple servers for a single file. e.g. for cutekitten.jpg: filepath="server1\cutekitten.jpg;server2\cutekitten.jpg", and then choose the server depending on the server load (or randomly, or alternating, ...)
if you're careful you may be able to move around files automatically depending on the current load. so if your cute-kitten image gets reddited/slashdotted hard, move it to the server with the lowest load and update the entry.
you could do this with a cron-job. just log the downloads for the last xx minutes. try some formular like (downloads-per-minutefilesize(product of serverloads)) for weighting. pick tresholds for increasing/decreasing the number of servers those files are distributed to.
if you add a new server, it's relativley painless (just add the address to the server pool)
cons:
homebrew solutions are always risky
your load distribution algorithm must be well tested, otherwise bad things could happen (everything mirrored everywhere)
constantly moving files around for balancing adds additional server load
* or use a mixed weighting algorithm: free-space, server-load, file-popularity
disclaimer: never been in the situation myself, just guessing.
Consider HDFS, which is part of Apache's Hadoop. This will integrate with PHP, but you'll be setting up a second application. This will also solve all your points of balancing among servers and handling things when your file space usage exceeds one server's ability. It's not purely in PHP, though, but I don't think that's what you meant when you said "pure" anyway.
See http://hadoop.apache.org/core/docs/current/hdfs_design.html for the idea of it. They cover the whole idea of how it handles large files, many files, replication, etc.

Categories