I have a dedicated server and I'm in need for building new version of my personal PHP5 CMS for my customers. Setting aside questions whether i should consider using open source, i need your opinions regarding CMS architecture.
First approach (since the server is completely in my control) is to build a centralized system that can support multiple sites from single administration panel. Basic idea is that I am able to log-in as super user, create new site (technically it creates new web root and new database, and maybe some other things), assign modules and plug-ins for specific customer or develop new ones if needed. If customer log-ins at this panel, he/she sees and can manage only their site content.
I have seen such a system (it was custom build), it's very nice to bug fixes and new features affects all customers instantly without need of patching every CMS that can also be on other hosting server...
The negative aspect I can see is scalability - what if i will need to add second server, how do I merge them to maintain single core.
The second is classical - stand-alone CMS for every customer.
What way will you go and why?
Thank you for your time.
If you were to have one central system for all clients, scalability could become easier. You can have one big database server and several identical web servers (probably behind a load balancer), and that way you don't have to worry about dividing the clients up into different servers. Your resources are pooled so if one client has a day with heavy traffic it can be taken up by several servers, rather than bringing one server (and all other clients' sites on it) to its knees.
You can get PHP sessions to work across multiple servers either by using 'sticky sessions' on your load-balancing configuration, or by getting PHP to store the data somewhere accessible to all servers (e.g. a database).
Keeping the web application files synchronised to one code base shouldn't be too difficult, there are tools like rsync that you could use to help you.
It really depends on the types of sites. That said, I would suggest that you consider using version control software to manage multiple installations. In practise, this can give you the same as with a centralised approach, but it gives you the freedom to postpone update of a single (or a number of) sites.
Related
Situation:
For a web shop, I want to build paged product lists - and filters on these lists - using Elasticsearch. I want to bypass the PHP/MySQL server on which the application runs entirely and communicate with Elasticsearch directly from the customer's browser through AJAX calls. Advantages are:
A large portion of the load on the PHP/MySQL server will be handled by the ES cluster instead
CDN opportunities (scaling!)
Problem:
This approach would take a massive load off of our backend server but creates a few new issues. Anonymous users will generate lots of requests but we need some control over those:
Traffic control:
How to defend against malicious users making lots of calls and scanning/downloading our entire product catalogue that way? (e.g. competition scraping pricing information)
How can I block IP's that have been identified (somehow) as behaving badly?
Access control:
How to make sure the frontend can only make the queries we want to allow?
How to make sure customers only see a selection of the result fields and can't get any data out of ES that's not intended for them?
It's essential not to have a single machine somewhere taking care of all this cause this would just recreate a single machine responsible for handling everything. I want to take real advantage of the ES cluster without having any middleware that has to deal with the scaling issue as well.
We don't want to be fully dependent on a 3rd party, we're looking for a solution that has some flexibility regarding the partners we're working with (e.g. switch between elastic and AWS).
Possible solutions or partial solutions:
I've been looking at a few 'Elasticsearch as a service' options but I'm not confident about their quality or even if I can solve the issues mentioned with them:
www.elastic.co/found, their premium solution has a 'shield' service which does not seem to cover all of the cases mentioned above (only IP blocking as far as I can tell), but there is a custom plugin (https://github.com/floragunncom/search-guard) that can do filtering on result fields and provides a way to do user management etc. This seems like a reasonable option but it is expensive and ties the application to the 'found' product. We should be able to switch partners should the need arise.
Amazon AWS Elasticsearch service has basic IAM support and it's possible to put CloudFront in front of it but does not provide any access control.
Installing a separate L7 application filtering solution for detecting scrapers etc.
Question:
Is there anyone out there who has this type of approach working and found a good setup that tackles all of these issues?
First thing I would recommend is restrict access to your elastic search instance from behind a security group and only allow the application servers IP address access on ports 22, 80, 9200, and 9300 which are the ports used by ElasticSearch.
As for protecting against scrapping there is no absolute solution for protection however if your aim is to simply limit the load these scrappers put on your application server and ES instance you can take a look at https://github.com/davedevelopment/stiphle which is aimed at rate limited users, the exmaple they use on their page is limiting to 5 requests a second which would seem very reasonable for the average user and could be lowered even further if need be to make scrapping a time consuming effort.
I have developed a Social Networking website based on WAMP(Windows, Apache, MySQL, PHP) server. I put it up on a Free Host (the hosting serves it in LAMP), and it works fine.
Now, I researched a little and found out that PHP applications are difficult to scale, and take a lot of parallel algorithms. I would like to test how many users does the webhost support for my website, and how much does my localhost.
It's a social network like any other, involving:
Posting data on main page(with images).
Chat between users (with polling every 3 seconds, consider as one in
facebook).
A question-answer forum(just as this one, or yahoo answers -
including upvotes,downvotes, points etc)
Two HTML5 server sent event loops running infinitely.
Many AJAX requests for retrieving data from MySQL database.
As for now, I haven't applied any Cache options, which I plan for later. Also, the chat application has to be switched from Polling to Websockets(HTML5).
My estimated user database goes to be a lot more than 100,000 users. That may need some serious scalability.
I need to know what kind of server may I need for the same. Should it be a dedicated server, should it be 2 of 'em or even more?
I tried this ab.exe located in bin folder of Apache, but it tests the location we provide manually. A social network needs login information to access all the data, which unfortunately limits the functionality of ab.exe only to availability of the "Welcome" page, and nothing towards the AJAX and HTML5 features which I mentioned above.
So, how exactly should I test the scalibilty of website for a hardware same as my laptop(Windows, Intel i5, 4gb Ram, 2.0 GHz), and what about scalability on Shared servers available out there, or even the dedicated ones.
Simply put: You are counting your chickens before they hatch. If you become overwhelmed by a bunch of new users, then that is what we tend to call a "good problem". If you're worried about scalability along the way, then you should look into:
Caching with Memcached or Redis.
Load balancing.
Switching from Apache to Nginx.
Providing a proper CDN.
Since you're using PHP, you should install an opcache.
There are a lot of different ways to squeeze out results. Until you need them, I'd suggest sticking by best practices (normalization, etc).
If you are worried about the capacity of your hosting company to cope with your application then the first thing to do is to contact them and discuss what capacity they have available and how scalable their environment is.
Unless you are hosting yourself, then you have virtually no control over the situation.
But if you believe the number of users is going to grow very quickly then it would be wise to enter a dialogue very early with your provider.
I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.
I'm planning an application that allow users to create a specific type of website. I wanted to link account names to 'account.myapp.com'. Going to 'account.myapp.com' will serve up a website. I haven't a clue on how to map this. I'll be using Code Igniter as my development tool.
I would like to give users the ability to add a registered domain name for their website, rather than use the standard sub domain. Any tips/methods on this?
Also, what are some pitfalls and problems I should be looking for when developing something with this design? My biggest concern is backing myself in a corner with bad database design, and creating a nightmare of an app to maintain or update.
Thanks for your time!
Your plan for having a single app that serves all the sites is going to be quite an undertaking and a lot of work. That is not to say it isn't possible (plenty of enterprise CMS's, including Sharepoint, allow you to run 'virtual sites' etc. from a single install).
You are going to need to undertake a lot of planning and design, specifically on the security front, to make sure the individual sites operate in isolation. I assume that each site will have its own account(s) - you are going to have to do a lot of work to make sure users can't accidentally (or malicously) start editing another site.
And you are right to consider maintanence - if you have all the sites running under a single application, and therefore a single database, that database is going to get big and messy, very quickly. It also then becomes a single point of failure.
A better plan would be to develop a self-contained solution (for a single website) - this can then run from it's own directory, with it's own database with it's own set of accounts. It would be significantly smaller (in terms of both code and database) and therefore probably perform a lot better. Day-to-Day maintanence would be easier (restore a website from backup), but software updates (to add a new feature) would be a bit trickier, though as it's PHP it's just file uploads and SQL patches, therefore you can automate this with ease.
In terms of domains: if you went with the invidual app (one per website) approach, you can use Apache's Dynamic Virtual Hosts feature, which effectively maps a URL to the filesystem, (so website.mydomain.com could be translated to automatically be severed from /home/vhosts/com/mydomain/website): thus deploying a new website would be a matter of simply copying the files into the correct directory, creating the database, and updating a config file, all of which could be automated with ease.
If users want to use their own URL's then they have to firstly update their DNS to point to your server and secondly you would need to configure an Apache vhost for that domain, which would probably involve a restart of apache and thus affect all other users.
This is very easy to do in codeigniter and can be complete done using routes.php, and a pre-controller hook.
THAT BEING SAID... Don't do this. It's generally not a good idea. Even 37Signals, who made this sort of account management famous is recanting, and moving towards centralized accounts. Check http://37signals.com/accounts
If I'm understanding your question correctly, you need to setup wildcard DNS, and use Apache mod_rewrite to internally redirect (example) myaccount.myapp.com to myapp.com/?account=myaccount. You app logic can take it from there.
I just Googled, "wildcard dns mod_rewrite account" (without quotes) and found some examples with instructions, such as:
http://www.reconn.us/content/view/46/67/
This is a valid and desirable way to structure certain web apps IMO. I'm not aware of serious drawbacks.
I don't really know of a great (automated/scalable) way to allow the users to specify their own individual domain names but you might be able to do it if you had them modify their domain's DNS to point to your web server, then added a ServerAlias directive to your "myapp" Apache configuration. You're still left with the problem of your myapp runtime instance understanding that requests coming through a customer's domain are specific to a customer account. So (example) customeraccount.com really equates to myapp.com/?account=customeraccount. Some more mod_rewrite rules could probably take care of this, but it's not automated (perhaps it could be though with an include file or such).
Sorry, you said you were using CodeIgniter ... substitute in then myapp.com/account/myaccount where I wrote myapp.com/?account=myaccount.
First, you have to setup your Apache (or whatever webserver you're using) to give every subdomain the same DNS settings (wildcard match). Then, you have to use CodeIgniter's routing features to parse the subdomain from the request URL, set that as param (or whatever that's called in CodeIgniter) and have some fun with it in your controller.
About pitfalls and problems: that depends on what you want to do. ;)
I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I'm now in a position where my meager virtual server hosted by MediaTemple isn't cutting it at all, and it's really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I'd take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I've spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what's the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I'd like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.
So many questions - all of them good though.
In terms of scaling, you've a few options.
The first is to start with a single box. You can scale upwards - with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you'll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier - I haven't used it personally.
Something else to be aware of - EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You'll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.
First step is to separate concerns. I'd split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I'd monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I'd also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don't extend far beyond mod_rewrite and simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It's worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You'd probably only have one of these, and it would probably implement some sort of connection to Amazon's EBS, or persistent storage solution, as mentioned by dcaunt.
app: An application server with Apache and PHP. You'd probably have many of these, and they'd get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
db: A database server with MySQL. Again, you'd probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.