So I wrote this nice SAAS solution and got some real clients. Now a request was made by a client to add some functionality that requires websockets.
In order to keep things tidy, I would like to use another server to manage the websockets.
My current stack is an AWS application loadbalancer, behind it two servers - one is the current application server. It's an Apache web server with PHP on it running the application.
The client side is driven by AngularJS.
The second server (which does not exist yet) is going to Nginx, and session will be stored on a third Memcache server.
Websocket and application server are both on the same domain, using different port in order to send the requests to the right server (AWS ELB allows sending requests to different server groups by port). Both the application and websockets will be driven by PHP and Ratchet.
My questions are two:
is for the more experienced developers - does such architecture sounds reasonable (I'm not aiming for 100Ks of concurrent yet - I need a viable and affordable solution aiming to max 5000 concurrents at this stage).
What would be the best way to send requests from the application server (who has the logic to generate the requests) to the websocket server?
Please notice I'm new to websockets so maybe there are much better ways to do this - I'll be grateful for any idea.
I'm in the middle of using Ratchet with a SPA to power a web app. I'm using Traefik as a front-end proxy, but yes, Nginx is popular here, and I'm sure that would be fine. I like Traefik for its ability to seamlessly redirect traffic based on config file changes, API triggers, and configuration changes from the likes of Kubernetes.
I agree with Michael in the comments - using ports for web sockets other than 80 or 443 may cause your users to experience connection problems. Home connections are generally fine on non-standard ports, but public wifi, business firewalls and mobile data can all present problems, and it's probably best not to risk it.
Having done a bit of reading around, your 5,000 concurrent connections is probably something that is going to need OS-level tweaks. I seem to recall 1,024 connections can be done comfortably, but several times that level would need testing (e.g. see here, though note the comment stream goes back a couple of years). Perhaps you could set up a test harness that fires web socket requests at your app, e.g. using two Docker containers? That will give you a way to understand what limits you will run into at this sort of scale.
Your maximum number of concurrent users strikes me that you're working at an absolutely enormous scale, given that any given userbase will usually not all be live at the same time. One thing you can do (and I plan to do the same in my case) is to add a reconnection strategy in your frontend that does a small number of retries, and then pops up a manual reconnect box (Trello does something like this). Given that some connections will be flaky, it is probably a good idea to give some of them a chance to die off, so you're left with fewer connections to manage. You can add an idle timer in the frontend also, to avoid pointlessly keeping unused connections open.
If you want to do some functional testing, consider PhantomJS - I use PHPUnit Spiderling here, and web sockets seems to work fine (I have only tried one at a time so far, mind you).
There are several PHP libraries to send web socket requests. I use Websocket Client for PHP, and it has worked flawlessly for me so far.
Related
I am trying understand websockets.
I have seen 2 examples here in doc and also here.
Both examples are using endless loop cycling, listening for when a new client connects, when they do something interesting and when they are disconnected.
My question is: Is using websockets (with endless loop cycling) better than an ajax solution with http requests per x time ?
AJAX and WebSockets are vastly different. Asking if one is better than the other is like asking if a screwdriver is better than a hammer.
WebSockets are used for real time, interactive communication. Both sides of a WebSocket connection can send data and it will be received within milliseconds by the other end. The connection stays open, reducing latency due to connection negotiation.
However, it only sort of plays nicely with HTTP. That is, it plays nicely with proxies that are WebSocket aware, and with firewalls. WebSocket traffic is most definitely not HTTP traffic, except for the client's first packet, which requests switching from HTTP to the WebSocket protocol.
AJAX, on the other hand, is pure HTTP. The only difference between AJAX and a standard web request is that an AJAX request is initiated by client side scripts and the response is available to that same script rather than reloading the page.
In both AJAX and WebSockets, the client scripts can receive data and use it within that same script. That's where the similarities end.
WebSockets set up a permanent connection and both sides can send data at any time, or sit quietly at any time. With AJAX, the client makes a request and the server responds.
For instance, if you were to set up a new message notification system, if you were using WebSockets, then as soon as a new message is available, the server sends it straight to the browser. If there are no new messages, the server stays quiet. If you were using AJAX, the client would periodically send a request to the server, which would always respond, either saying there were no new messages, or delivering the notifications that are pending. There is no way for the server to initiate things on its end, it must wait for the AJAX request.
Server side, things diverge from the traditional PHP web development paradigms. A typical WebSocket server will be a stand alone, CLI application running as a daemon. (If that last sentence doesn't make sense, please spend a while taking the time to really understanding how to administer a server.)
This means that multiple clients will be connecting to the same script, and superglobal variables like $_GET and $_SESSION will be absolutely meaningless. It seems easy to conceptualize in a small use case, but remember that you will most likely want to get information from other parts of your site, which often means using libraries and frameworks that have absolutely no concept of accessing data outside of the HTTP request/response model.
Thus, for simplicity, you'll usually want to stick with AJAX requests and periodic polling, unless you have the means to rethink the network data and possibly re-implement things that your libraries automate, if you're looking to update standard web traffic.
As for the server's loop:
It is not a busy loop, it is an IO blocked loop.
If the server tries to read network data and none is available, the operating system will block (pause) the script and go off to do whatever else needs to be done. In my WS server, I block waiting for network traffic for at most 1 second at a time, before the script returns to check and see if anything else new happened that I should notify my clients of. Typically, this is barely a few milliseconds before the server goes right back to its IO blocked state waiting for new data on the wire. Some others have implemented my server using LibEv, which allows them to respond to events outside of the network IO without having to wait for the block to timeout.
This is the way nearly every server does things. This is why you can have Apache actively listening and serving web traffic without every server that runs Apache being pegged at 100% CPU usage even when there is no traffic.
In closing, WebSockets is a wonderful technology, but web libraries and frameworks are simply not built to use them. Thus, unless you're working in a system where waiting 3 seconds for a full AJAX request is far, far too long, it's probably best to use AJAX. If you're writing a multiplayer interactive game or a chat system, then you've found a perfect use for WebSockets.
I do heartily encourage everyone to learn WebSockets... but it's not a magic bullet, and few parts of the web are designed in ways where people can get real use out of it.
Yes, sockets are better in many cases.
It's not forever loop with 100% cpu utilizing, it's just liveloop, which exists in each daemon application.
Sync accept operation is where 99.99% of time we are.
Ajax heartbeat is more traffic, more server CPU and memory.
I too am in the learning phase. I have built a php-based websocket server and have it communicating with web pages. Perhaps my 2c perspective is useful.
Getting the websocket server (wss) working using available sources as a starting point is not that difficult, but what to do with it all next is.
The wss runs in CLI version of php. Late model browser loads a normal http or https page containing a request to the wss, along with anything else that page needs to do, a handshake occurs. Communication is then possible directly between browser and wss at the whim of either end. This is low overhead and hence fast and simple. Very cool. What is said over that link needs to be understood by both ends - subprotocol agreement. You may have to roll your own in php and in javascript. No more http headers, urls, etc etc.
The wss is a long-lived, stateful instance of php (very unlike apache etc which forget you on sending the page). An entire app can be run in the wss instance, keeping state for itself and each connected client. It used to be said that php was too leaky for long life but I don't hear that much any more. But I believe you still have to be careful with memory.
However, being a single php instance there is not the usual separation between client instances. For example statics in classes are shared with every class instance and hence every client. So for a single user style app sharing data with a heap of clients this is great. I can see that Ajax type calls can be replaced in this way, but if the app still had to rebuild state to service each client, and then release it to save resources, that seems to lessen the advantage.
Going a step further and keeping truly stateful instances for clients seems like a possible next step. Replicating the traditional session based system is one possibility, alternatively fork new php interpreters and look after communications between parent and children via sockets or suchlike. But this would require resources per client that would be severely limiting for any non-trivial app.
Or perhaps it is possible to put the bulk of the app in the parent and let the children just do the very client specific stuff. Or break the app design into small independent units that can communicate directly via sockets. Socket communication does seem to be catching on nowadays.
As Ghedpunk says in so many ways, the real world does not yet seem ready to realise the full potential of the web socket concept but it can certainly replace Ajax. The added advantage of the server sending without being asked opens up new possibilities previously too difficult to consider.
This is complicated, and not necessarily one question. I'd appreciate any possible help.
I've read that is is possible to have websockets without server access, but I cannot seem to find any examples that show how it is. I've come to that conclusion (that I believe I need this) based on the following two things:
I've been struggling for the past several hours trying to figure out how to even get websockets to work with the WAMP server I have on my machine, which I have root access. Installed composer, but cannot figure out how to install the composer.phar file to install ratchet. Have tried other PHP websocket implementations (would prefer that it be in PHP), but still cannot get them to work.
My current webhost I'm using to test things out on is a free host, and doesn't allow SSH access. So, even if I could figure out to get websockets with root access, it is a moot point when it comes to the host.
I've also found free VPS hosts by googling (of course, limited everything) but has full root access, but I'd prefer to keep something that allows more bandwidth (my free host is currently unlimited). And I've read that you can (and should) host the websocket server on a different subdomain than the HTTP server, and that it can even be run on a different domain entirely.
It also might eventually be cheaper to host my own site, of course have no real clue on that, but in that case I'd need to figure out how to even get websockets working on my machine.
So, if anyone can understand what I'm asking, several questions here, is it possible to use websockets without root access, and if so, how? How do I properly install ratchet websockets when I cannot figure out the composer.phar file (I have composer.json with the ratchet code in it but not sure if it's in the right directory), and this question is if the first question is not truly possible. Is it then possible to have websocket server on a VPS and have the HTTP server on an entirely different domain and if so, is there any documentation anywhere about it?
I mean, of course, there is an option of using AJAX and forcing the browser to reload a JS file every period of time that would use jQuery ajax to update a series of divs regardless of whether anything has been changed, but that could get complicated, and I'm not even sure if that is possible (I don't see why it wouldn't be), but then again I'd prefer websockets over that since I hear they are much less resource hungry than some sort of this paragraph would be.
A plain PHP file running under vanilla LAMP (i.e. mod_php under Apache) cannot handle WebSocket connections. It wouldn't be able to perform the protocol upgrade, let alone actually perform real-time communication, at least through Apache. In theory, you could have a very-long-running web request to a PHP file which runs a TCP server to serve WebSocket requests, but this is impractical and I doubt a shared host will actually allow PHP to do that.
There may be some shared hosts that make it possible WebSocket hosting with PHP, but they can't offer that without either SSH/shell access, or some other way to run PHP outside the web server. If they're just giving you a directory to upload PHP files to, and serving them with Apache, you're out of luck.
As for your trouble with Composer, I don't know if it's possible to run composer.phar on a shared host without some kind of shell access. Some hosts (e.g. Heroku) have specific support for Composer.
Regarding running a WebSocket server on an entirely different domain, you can indeed do that. Just point your JavaScript to connect to that domain, and make sure that the WebSocket server provides the necessary Cross-Origin Resource Sharing headers.
OK... you have a few questions, so I will try to answer them one by one.
1. What to use
You could use Socket.IO. Its a library for developing realtime web application based on JavaScript. It consists of 2 parts - client side (runs on the visitor browser) and server side. Basic usage does not require almost any background knowledge on Node.js. Here is an example tutorial for a simple chat app on the official Socket.IO website.
2. Hosting
Most of the hosting providers have control panel (cPanel) with the capebility to install/activate different Apache plugins and so on. First you should check if Node.js isn't available already, if not you could contact support and ask them if including this would be an option.
If you don't have any luck with your current hosting provider you could always switch hosts quickly as there are a lot of good deals out there. Google will definitely help you here.
Here is a list containing a few of the (maybe) best options. Keep in mind that although some hosting deals may be paid there are a lot of low cost options to choose from.
3. Bandwidth
As you are worried about "resource hungry" code maybe you can try hosting some of your content on Amazon CloudFront. It's a content delivery network that is widely used and guarantees quick connection and fast resource loading as the files are loaded from the closest to the client server. The best part is that you only pay for what you actually use, so if you don't have that much traffic it would be really cheap to run and still reliable!
Hope this helps ;)
Background
The website has a classic LAMP setup and runs on a virtual private server. The aim is to add an HTML5 multiplayer game which runs on the same domain, has latency of up to 500ms, maintains state on the server side and can support a few thousand concurrent games with 2-5 players per game at peak times.
Since I had little experience with PHP and server side in general, my original plan was write a game demo in node.js+socket.io and rewrite it in PHP later on. However, now that I've written the demo (~400 lines of code on the server side) I have doubts about the plan. There are two ways of integration that I am considering:
Write the game in PHP
pros:
Fewer changes to the original setup
Not having to split server side into two parts in terms of languages, templates etc.
cons:
Lack of popular solutions for real time communications in PHP
Scalability concerns
Run node.js and PHP servers in parallel
Since the site is hosted on a VPS, I think I can put nginx in front of Apache and node.js so that the client will only have to use single port on a single domain.
pros:
Ability to use socket.io for real-time communication with the client
If the game server goes down, the rest of the site will still work
cons:
complicating the setup by adding another web server and a reverse proxy
Question
As I said, I have little experience with server side, although making the demo helped a lot. Are there better ways of doing this? Did I miss out important pros/cons? Which points are the most important in practice?
With all the buzz around WebSockets, it's pretty hard to find a good walkthrough on how to use them with an Apache server on Google.
We're developing a plugin, in PHP (symfony2), which will run from time to time kind of a chat instance. And we find WebSockets more interesting, standard and quick than AJAX for this matter. The thing is, we don't have much sysadmin ressources in our group and we find hard to gather good informations on the following matters:
Can we run a WebSocket instance on a traditional Apache, dedicated server, and if yes, do you have useful links for us?
If we need to mod the server, what kind of tools would you recommend knowing that we are not too skilled in sysadmin so we can't afford to have a high maintenance b*** on this.
Thank you very much,
ps: we'll link back to your blog/site as we'll make a technical/informational post on our devblog about this part of our app.
Thank you again!
As #zaf states you are more likely to find a standalone PHP solution - not something that runs within Apache. That said there is a apache WebSocket module.
However, the fundamental problem is that Apache wasn't built with maintaining many persistent connections in mind. It, along with PHP, is built on the idea that requests are made and responses are quickly sent back. This means that resources can very quickly be used up if you are holding requests open and you're going to need to look into horizontal scaling pretty quickly.
Personally I think you have two options:
Use an alternative realtime web technology solution and communicate between your web application and realtime web infrastructure using queues or short-lived requests (web services).
Off load the handling of persistent connections and scaling of the realtime web infrastructure to a realtime web hosted service. I work for Pusher and we fall into this category.
For both self-hosted and hosted options you can check out my realtime web tech guide.
One path is to use an independent installed web sockets server.
For PHP you can try:
http://code.google.com/p/phpwebsocket/ or http://github.com/Devristo/phpws/
There are some other projects which you can try as well.
Basically, you need to upload, unpack and start running the process.
On the frontend, you'll have javascript connecting to the server on the specific port.
Most websocket servers have a demo which echoes back whatever it hears, so this is a good place to write some test code. You may even find a rudimentary chat implementation.
The tricky part is to monitor the web socket server and to make sure it runs smoothly and continuously.
Try to test on as many browsers/devices as possible as this will decide on which websocket server implementation you choose. There are old and new protocols you have to watch out for.
I introduced another websocket server: PHP Ratchet (Github).
This is better and complete list of client & server side codes and browser support.
Please check this link.
Another Path is to use a dedicated websocket server.
Try Achex Websocket Server at www.achex.ca and checkout the tutorials.
OR
If you really want Apache, check out Apache Camel. (but you have to set it up and its a bit more complicated than achex server)
http://camel.apache.org/websocket.html
I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I'm now in a position where my meager virtual server hosted by MediaTemple isn't cutting it at all, and it's really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I'd take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I've spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what's the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I'd like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.
So many questions - all of them good though.
In terms of scaling, you've a few options.
The first is to start with a single box. You can scale upwards - with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you'll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier - I haven't used it personally.
Something else to be aware of - EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You'll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.
First step is to separate concerns. I'd split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I'd monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I'd also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don't extend far beyond mod_rewrite and simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It's worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You'd probably only have one of these, and it would probably implement some sort of connection to Amazon's EBS, or persistent storage solution, as mentioned by dcaunt.
app: An application server with Apache and PHP. You'd probably have many of these, and they'd get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
db: A database server with MySQL. Again, you'd probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.