I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.
Related
I have developed a Social Networking website based on WAMP(Windows, Apache, MySQL, PHP) server. I put it up on a Free Host (the hosting serves it in LAMP), and it works fine.
Now, I researched a little and found out that PHP applications are difficult to scale, and take a lot of parallel algorithms. I would like to test how many users does the webhost support for my website, and how much does my localhost.
It's a social network like any other, involving:
Posting data on main page(with images).
Chat between users (with polling every 3 seconds, consider as one in
facebook).
A question-answer forum(just as this one, or yahoo answers -
including upvotes,downvotes, points etc)
Two HTML5 server sent event loops running infinitely.
Many AJAX requests for retrieving data from MySQL database.
As for now, I haven't applied any Cache options, which I plan for later. Also, the chat application has to be switched from Polling to Websockets(HTML5).
My estimated user database goes to be a lot more than 100,000 users. That may need some serious scalability.
I need to know what kind of server may I need for the same. Should it be a dedicated server, should it be 2 of 'em or even more?
I tried this ab.exe located in bin folder of Apache, but it tests the location we provide manually. A social network needs login information to access all the data, which unfortunately limits the functionality of ab.exe only to availability of the "Welcome" page, and nothing towards the AJAX and HTML5 features which I mentioned above.
So, how exactly should I test the scalibilty of website for a hardware same as my laptop(Windows, Intel i5, 4gb Ram, 2.0 GHz), and what about scalability on Shared servers available out there, or even the dedicated ones.
Simply put: You are counting your chickens before they hatch. If you become overwhelmed by a bunch of new users, then that is what we tend to call a "good problem". If you're worried about scalability along the way, then you should look into:
Caching with Memcached or Redis.
Load balancing.
Switching from Apache to Nginx.
Providing a proper CDN.
Since you're using PHP, you should install an opcache.
There are a lot of different ways to squeeze out results. Until you need them, I'd suggest sticking by best practices (normalization, etc).
If you are worried about the capacity of your hosting company to cope with your application then the first thing to do is to contact them and discuss what capacity they have available and how scalable their environment is.
Unless you are hosting yourself, then you have virtually no control over the situation.
But if you believe the number of users is going to grow very quickly then it would be wise to enter a dialogue very early with your provider.
Is there a way to store/manage PHP sessions in a similar way that the IIS (Session State Service) ?
I want to have multiple front end web servers for an multi domain e-commerce platform and manage the sessions centrally. The idea being that is a server goes down users with cart contents will not have to start a new session when they are shifted to a another web server.
I know cookies and URL parameters could do it to a point but that's not answering the question.
You can register a SessionHandlerInterface which is backed by a shared database (e.g. MySQL Cluster).
For anyone looking for this because they are moving to Amazon Web Services, there are two options/alternatives:
Use the DynamoDB session handler from the AWS SDK for PHP. This essentially has the same effect as session replication. However, there are monetary costs from DynamoDB, especially if you need locking.
Use session stickiness in the load balancer. This is simpler to set up, and free, but is probably not quite as scalable, as requests from old sessions can't just be sent on to newly started servers.
The most scalable option is of course to get rid of server-side sessions, but that is not always easy without huge changes in backends and frontends, and in some cases not even desirable because of other considerations.
I am developing an iPhone app and would like to create some sort of RESTful API so different users of the app can share information/data. To create a community of sorts.
Say my app is some sort of game, and I want the user to be able to post their highscore on a global leaderboard as well as maintain a list of friends and see their scores. My app is nothing like this but it shows the kind of collective information access I need to implement.
The way I could implement this is to set up a PHP and MySQL server and have a php script that interacts with the database and mediates the requests between the DB and each user on the iPhone, by taking a GET request and returning a JSON string.
Is this a good way to do it? Seems to me like using PHP is a slow way to implement this as opposed to say a compiled language. I could be very wrong though. I am trying to keep my hosting bills down because I plan to release the app for free. I do recognise that an implementation that performs better in terms of CPU cycles and RAM usage (e.g. something compiled written in say C#?) might require more expensive hosting solutions than say a LAMP server so might actually end up being more expensive in terms of $/request.
I also want my implementation to be scalable in the rare case that a lot of people start using the app. Does the usage volume shift the performance/$ ratio towards a different implementation? I.e. if I have 1k request/day it might be cheaper to use PHP+MySQL, but 1M requests/day might make using something else cheaper?
To summarise, how would you implement a (fairly simple) remote database that would be accessed remotely using HTTP(S) in order to minimise hosting bills? What kind of hosting solution and what kind of platform/language?
UPDATE: per Karl's suggestion I tried: Ruby (language) + Sinatra (framework) + Heroku (app hosting) + Amazon S3 (static file hosting). To anyone reading this who might have the same dilemma I had, this setup is amazing: effortlessly scalable (to "infinity"), affordable, easy to use. Thanks Karl!
Can't comment on DB specifics yet because I haven't implemented that yet although for my simple query requirements, CouchDB and MongoDB seem like good choices and they are integrated with Heroku.
Have you considered using Sinatra and hosting it on [Heroku]? This is exactly what Sinatra excels at (REST services). And hosting with Heroku may be free, depending on the amount of data you need to store. Just keep all your supporting files (images, javascript, css) on S3. You'll be in the cloud and flying in no time.
This may not fit with your PHP desires, but honestly, it doesn't get any easier than Sinatra.
It comes down to a tradeoff between cost vs experience.
if you have the expertise, I would definitely look into some form of cloud based infrastructure, something like Google App Engine. Which cloud platform you go with depends on what experience you have with different languages (AppEngine only works with Python/Java for e.g). Generally though, scalable cloud based platforms have more "gotchas" and need more know-how, because they are specifically tuned for high-end scalability (and thus require knowledge of enterprise level concepts in some cases).
If you want to be up and running as quickly and simply as possible I would personally go for a CakePHP install. Setup the model data to represent the basic entities you are managing, then use CakePHP's wonderful convention-loving magic to expose CRUD updates on these models with ease!
The technology you use to implement the REST services will have a far less significant impact on performance and hosting costs than the way you use HTTP. Learning to take advantage of HTTP is far more than simply learning how to use GET, PUT, POST and DELETE.
Use whatever server side technology you already know and spend some quality time reading RFC2616. You'll save yourself a ton of time and money.
In your case its database server that's accessed on each request. so even if you have compiled language (say C# or java) it wont matter much (unless you are doing some data transformation or processing).
So DB server have to scale well. here your choice of language and DB should be well configured with host OS.
In short PHP+MySQL is good if you are sending/receiving JSON strings and storing/retrieving in DB with minimum data processing.
next app gets popular and if your app don't require frequent updates to existing data then you can move such data to very high scalable databases like MongoDB (JSON friendly).
I have a dedicated server and I'm in need for building new version of my personal PHP5 CMS for my customers. Setting aside questions whether i should consider using open source, i need your opinions regarding CMS architecture.
First approach (since the server is completely in my control) is to build a centralized system that can support multiple sites from single administration panel. Basic idea is that I am able to log-in as super user, create new site (technically it creates new web root and new database, and maybe some other things), assign modules and plug-ins for specific customer or develop new ones if needed. If customer log-ins at this panel, he/she sees and can manage only their site content.
I have seen such a system (it was custom build), it's very nice to bug fixes and new features affects all customers instantly without need of patching every CMS that can also be on other hosting server...
The negative aspect I can see is scalability - what if i will need to add second server, how do I merge them to maintain single core.
The second is classical - stand-alone CMS for every customer.
What way will you go and why?
Thank you for your time.
If you were to have one central system for all clients, scalability could become easier. You can have one big database server and several identical web servers (probably behind a load balancer), and that way you don't have to worry about dividing the clients up into different servers. Your resources are pooled so if one client has a day with heavy traffic it can be taken up by several servers, rather than bringing one server (and all other clients' sites on it) to its knees.
You can get PHP sessions to work across multiple servers either by using 'sticky sessions' on your load-balancing configuration, or by getting PHP to store the data somewhere accessible to all servers (e.g. a database).
Keeping the web application files synchronised to one code base shouldn't be too difficult, there are tools like rsync that you could use to help you.
It really depends on the types of sites. That said, I would suggest that you consider using version control software to manage multiple installations. In practise, this can give you the same as with a centralised approach, but it gives you the freedom to postpone update of a single (or a number of) sites.
I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I'm now in a position where my meager virtual server hosted by MediaTemple isn't cutting it at all, and it's really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I'd take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I've spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what's the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I'd like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.
So many questions - all of them good though.
In terms of scaling, you've a few options.
The first is to start with a single box. You can scale upwards - with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you'll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier - I haven't used it personally.
Something else to be aware of - EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You'll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.
First step is to separate concerns. I'd split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I'd monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I'd also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don't extend far beyond mod_rewrite and simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It's worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You'd probably only have one of these, and it would probably implement some sort of connection to Amazon's EBS, or persistent storage solution, as mentioned by dcaunt.
app: An application server with Apache and PHP. You'd probably have many of these, and they'd get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
db: A database server with MySQL. Again, you'd probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.