I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I'm now in a position where my meager virtual server hosted by MediaTemple isn't cutting it at all, and it's really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I'd take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I've spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what's the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I'd like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.
So many questions - all of them good though.
In terms of scaling, you've a few options.
The first is to start with a single box. You can scale upwards - with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you'll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier - I haven't used it personally.
Something else to be aware of - EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You'll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.
First step is to separate concerns. I'd split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I'd monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I'd also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don't extend far beyond mod_rewrite and simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It's worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You'd probably only have one of these, and it would probably implement some sort of connection to Amazon's EBS, or persistent storage solution, as mentioned by dcaunt.
app: An application server with Apache and PHP. You'd probably have many of these, and they'd get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
db: A database server with MySQL. Again, you'd probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.
Related
I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?
If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.
Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.
I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.
I have developed a Social Networking website based on WAMP(Windows, Apache, MySQL, PHP) server. I put it up on a Free Host (the hosting serves it in LAMP), and it works fine.
Now, I researched a little and found out that PHP applications are difficult to scale, and take a lot of parallel algorithms. I would like to test how many users does the webhost support for my website, and how much does my localhost.
It's a social network like any other, involving:
Posting data on main page(with images).
Chat between users (with polling every 3 seconds, consider as one in
facebook).
A question-answer forum(just as this one, or yahoo answers -
including upvotes,downvotes, points etc)
Two HTML5 server sent event loops running infinitely.
Many AJAX requests for retrieving data from MySQL database.
As for now, I haven't applied any Cache options, which I plan for later. Also, the chat application has to be switched from Polling to Websockets(HTML5).
My estimated user database goes to be a lot more than 100,000 users. That may need some serious scalability.
I need to know what kind of server may I need for the same. Should it be a dedicated server, should it be 2 of 'em or even more?
I tried this ab.exe located in bin folder of Apache, but it tests the location we provide manually. A social network needs login information to access all the data, which unfortunately limits the functionality of ab.exe only to availability of the "Welcome" page, and nothing towards the AJAX and HTML5 features which I mentioned above.
So, how exactly should I test the scalibilty of website for a hardware same as my laptop(Windows, Intel i5, 4gb Ram, 2.0 GHz), and what about scalability on Shared servers available out there, or even the dedicated ones.
Simply put: You are counting your chickens before they hatch. If you become overwhelmed by a bunch of new users, then that is what we tend to call a "good problem". If you're worried about scalability along the way, then you should look into:
Caching with Memcached or Redis.
Load balancing.
Switching from Apache to Nginx.
Providing a proper CDN.
Since you're using PHP, you should install an opcache.
There are a lot of different ways to squeeze out results. Until you need them, I'd suggest sticking by best practices (normalization, etc).
If you are worried about the capacity of your hosting company to cope with your application then the first thing to do is to contact them and discuss what capacity they have available and how scalable their environment is.
Unless you are hosting yourself, then you have virtually no control over the situation.
But if you believe the number of users is going to grow very quickly then it would be wise to enter a dialogue very early with your provider.
I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Right now I am planning to build a news publication website that should be automatically scalable when traffic is increasing. I have good experience in developing web applications using ASP.NET and PHP. To move forward on selection of specific technology here are some questions for which I need some clarifications.
My primary intention is to reduce the hosting charges. If we choose LAMP this will cost lower than ASP.NET on Windows. As my intention is to host web application on cloud service, will this make any difference? Means, for dedicated servers we need to pay extra cost for Windows O/S if you compare it with Linux. If we go for cloud servers, will they charge anything for O/S for each instance or will they just charge on computation hours irrespective of Operating System?
Do we have complete control (like dedicated server) on cloud instance to install any other softwares?
Do we need to host web server and and database server on multiple instances when traffic is increasing or will a big size instance can handle huge traffic?
Please help me on taking a decision by clarifying above.
AppHarbor is like Heroku for ASP.NET and has a free plan (like Heroku) that includes a MSSQL database and Git based deployment. You may want to check them out.
Depends on the provider. Windows almost always has an additional cost fue to licensing.
Again, depends on the provider. Amazon EC2 and Rackspace Cloud both give you root on the instance.
Depends where your bottleneck is. If CPU and memory are in acceptable ranges, but the site runs slowly, you'll need more instances to handle the network demand. If network is fine but CPU or memory are high, a bigger instance may suffice, up to a point.
PHPFog
EngineYard
pagodabox
All of the above have free plans to start with.
If you're looking for something like Google AppEngine or Heroku for PHP you might wanna try our service http://cloudcontrol.com
i am using phpfog.com, works great, fast, and cheap
i think what your looking for is a service that Google gives, that i think is amazing, they made a cloud solution to your problem where you don't have to worry about traffic and server issues, they balance the traffic of all the projects they are hosting.
The best think about it is that you can start free, and then if your app is very popular and you need more bandwidth usage, they charge, you can check the destails here:
http://code.google.com/appengine/
they only disadvantage for you is that it is mainly in java and you need to learn there framework, but i have heard of proyects that allow you to creat your site in php
But if you want to buy a server i recomend
http://mediatemple.net/
i have tried there service and the have virtual servers, and you have full control and if you need more servers you can extend your project, since its cloud based.
hope this helps
Depending on which provider you choose, you may pay for hours your instance is live, or you may pay for actual CPU usage. Eg:
With Amazon EC2, you pay for hours your instance is up (Windows does cost a little more, but it's built-in and paid by Amazon, though there is an option to bring your own licences)
With Windows Azure, you pay for hours your instance is up
With Google App Engine, you pay for actual CPU time (and get 6.5 hours/day free)
Again, this depends on the provider, eg.:
With Amazon EC2, you have full control over the instance, so you can install anything you like (presumably the ToS has exceptions, eg. for unlicensed software)
With Google App Engine, you can use third party libraries, but you cannot "install" things. You don't get instances in the same way as a VPS, they deal with taking your code and making it run across their infrastructure.
I presume Windows Azure works similarly to App Engine here, and you don't have an actual OS install that you could install software in.
You guessed it, this depends on the provider, eg.:
With Windows Azure, you would need to add additional instances to handle increases in load.
With Amazon EC2, you also need to add additional instances. They have services like Autoscaling and Elastic Load Balancing to make this easier (Autoscaling boots up/shuts down instances based on load).
With Google App Engine, you don't need to do anything. Google will automatically fire up new instances as required, and you will pay only for the additional resources in the same way.
There are many factors influencing which would work out cheapest (in terms of development costs, managing servers, Windows licensing, etc.), so it's hard to say which would be best. You'd need to look at each one and estimate costs. I'm trying to do a similar thing for my small projects, some of the info here might be useful: http://blog.dantup.com/2010/10/google-app-engine-gae-vs-amazon-elastic-computing-ec2-vs-microsoft-azure
I'm using AppFog for my applications, they support various platforms, such as Java, PHP and Python, once you're using them, updating your application and mapping it to a url is a breeze and very quick. they also support different services such as MySQL and MongoDB.
Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).
If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.
You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.
If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)