I am trying to make something like facebook live feeds, for example: when someone likes something or comments on something, the page updates without refreshing it! I want to know which is the proper way to do this? regards
Realtime updates in a web application is a hard problem because a single server handling many simultaneous long-lived TCP connections is a hard problem.
This is essentially impossible on a traditional web server like Apache + PHP because it allocates an entire OS thread for each incoming connection. Threads have significant overhead (like ~2 MB of RAM just for the stack space, plus whatever heap memory your application needs), so as few as a few hundred clients having your page open at the same time can bring a small server to its knees, and even an extra-large (and extra-expensive) hundred-GB-of-RAM server can only handle a few thousand concurrent connections.
Realtime communications is where Node really shines. Its single-threaded, event-driven architecture can easily support 2,000 concurrent connections on a commodity laptop, because each incoming connection is a small (a few kilobytes) heap allocation. The limiting factor actually becomes the CPU and the underlying OS's TCP stack.
My recommendation is to take a look at Node – this is exactly the kind of problem it is designed for. You already know JavaScript, so it's really just a matter of the API and mastering Node's async, event-driven nature.
You'll probably want to use Express for your HTTP server needs and use Socket.io for the realtime communications.
Socket.io is especially wonderful because its client-side library abstracts away all of the drudgery of cross-browser support:
In A-grade browsers, it connects to your server via WebSockets. This gets you a TCP socket that remains connected indefinitely, over which you can push arbitrary data at any time.
In downlevel browsers, it uses a fallback mechanism:
A Flash-based transport like WebSockets, but requires Flash player (if available)
AJAX long polling
And some more esoteric fallbacks if neither of those work
You can use long polling, yes. Or, you can start to innovate and start using HTML5's connectivity capabilities and REALTIME the sh*t out of your site. There are already several out-of-the-box solutions for that, my favourite being the xRTML Realtime Framework.
Check it out
Related
I am trying understand websockets.
I have seen 2 examples here in doc and also here.
Both examples are using endless loop cycling, listening for when a new client connects, when they do something interesting and when they are disconnected.
My question is: Is using websockets (with endless loop cycling) better than an ajax solution with http requests per x time ?
AJAX and WebSockets are vastly different. Asking if one is better than the other is like asking if a screwdriver is better than a hammer.
WebSockets are used for real time, interactive communication. Both sides of a WebSocket connection can send data and it will be received within milliseconds by the other end. The connection stays open, reducing latency due to connection negotiation.
However, it only sort of plays nicely with HTTP. That is, it plays nicely with proxies that are WebSocket aware, and with firewalls. WebSocket traffic is most definitely not HTTP traffic, except for the client's first packet, which requests switching from HTTP to the WebSocket protocol.
AJAX, on the other hand, is pure HTTP. The only difference between AJAX and a standard web request is that an AJAX request is initiated by client side scripts and the response is available to that same script rather than reloading the page.
In both AJAX and WebSockets, the client scripts can receive data and use it within that same script. That's where the similarities end.
WebSockets set up a permanent connection and both sides can send data at any time, or sit quietly at any time. With AJAX, the client makes a request and the server responds.
For instance, if you were to set up a new message notification system, if you were using WebSockets, then as soon as a new message is available, the server sends it straight to the browser. If there are no new messages, the server stays quiet. If you were using AJAX, the client would periodically send a request to the server, which would always respond, either saying there were no new messages, or delivering the notifications that are pending. There is no way for the server to initiate things on its end, it must wait for the AJAX request.
Server side, things diverge from the traditional PHP web development paradigms. A typical WebSocket server will be a stand alone, CLI application running as a daemon. (If that last sentence doesn't make sense, please spend a while taking the time to really understanding how to administer a server.)
This means that multiple clients will be connecting to the same script, and superglobal variables like $_GET and $_SESSION will be absolutely meaningless. It seems easy to conceptualize in a small use case, but remember that you will most likely want to get information from other parts of your site, which often means using libraries and frameworks that have absolutely no concept of accessing data outside of the HTTP request/response model.
Thus, for simplicity, you'll usually want to stick with AJAX requests and periodic polling, unless you have the means to rethink the network data and possibly re-implement things that your libraries automate, if you're looking to update standard web traffic.
As for the server's loop:
It is not a busy loop, it is an IO blocked loop.
If the server tries to read network data and none is available, the operating system will block (pause) the script and go off to do whatever else needs to be done. In my WS server, I block waiting for network traffic for at most 1 second at a time, before the script returns to check and see if anything else new happened that I should notify my clients of. Typically, this is barely a few milliseconds before the server goes right back to its IO blocked state waiting for new data on the wire. Some others have implemented my server using LibEv, which allows them to respond to events outside of the network IO without having to wait for the block to timeout.
This is the way nearly every server does things. This is why you can have Apache actively listening and serving web traffic without every server that runs Apache being pegged at 100% CPU usage even when there is no traffic.
In closing, WebSockets is a wonderful technology, but web libraries and frameworks are simply not built to use them. Thus, unless you're working in a system where waiting 3 seconds for a full AJAX request is far, far too long, it's probably best to use AJAX. If you're writing a multiplayer interactive game or a chat system, then you've found a perfect use for WebSockets.
I do heartily encourage everyone to learn WebSockets... but it's not a magic bullet, and few parts of the web are designed in ways where people can get real use out of it.
Yes, sockets are better in many cases.
It's not forever loop with 100% cpu utilizing, it's just liveloop, which exists in each daemon application.
Sync accept operation is where 99.99% of time we are.
Ajax heartbeat is more traffic, more server CPU and memory.
I too am in the learning phase. I have built a php-based websocket server and have it communicating with web pages. Perhaps my 2c perspective is useful.
Getting the websocket server (wss) working using available sources as a starting point is not that difficult, but what to do with it all next is.
The wss runs in CLI version of php. Late model browser loads a normal http or https page containing a request to the wss, along with anything else that page needs to do, a handshake occurs. Communication is then possible directly between browser and wss at the whim of either end. This is low overhead and hence fast and simple. Very cool. What is said over that link needs to be understood by both ends - subprotocol agreement. You may have to roll your own in php and in javascript. No more http headers, urls, etc etc.
The wss is a long-lived, stateful instance of php (very unlike apache etc which forget you on sending the page). An entire app can be run in the wss instance, keeping state for itself and each connected client. It used to be said that php was too leaky for long life but I don't hear that much any more. But I believe you still have to be careful with memory.
However, being a single php instance there is not the usual separation between client instances. For example statics in classes are shared with every class instance and hence every client. So for a single user style app sharing data with a heap of clients this is great. I can see that Ajax type calls can be replaced in this way, but if the app still had to rebuild state to service each client, and then release it to save resources, that seems to lessen the advantage.
Going a step further and keeping truly stateful instances for clients seems like a possible next step. Replicating the traditional session based system is one possibility, alternatively fork new php interpreters and look after communications between parent and children via sockets or suchlike. But this would require resources per client that would be severely limiting for any non-trivial app.
Or perhaps it is possible to put the bulk of the app in the parent and let the children just do the very client specific stuff. Or break the app design into small independent units that can communicate directly via sockets. Socket communication does seem to be catching on nowadays.
As Ghedpunk says in so many ways, the real world does not yet seem ready to realise the full potential of the web socket concept but it can certainly replace Ajax. The added advantage of the server sending without being asked opens up new possibilities previously too difficult to consider.
I have a web application driven primarily by javascript/ajax, somewhat similar to how google docs work; all people viewing a page will be seeing the same information in relative real-time. It's not crucial that the information is actually real-time, a second or so is fine.
Currently, the application is ajaxing the server every 5 seconds. I was researching server-sent events and they sound like exactly what I need... but this is my understanding: server-sent events essentially just move the polling to the server. The PHP script doing the server-sent events will check the database for changes every X seconds, and send an update to the application when it finds one.
Checking once per second would probably be adequate, but since I'm on shared hosting I want to avoid any unnecessary load possible. Is there way I can subscribe to updates to the database? Or is there a way I can notify the script from other PHP scripts that make changes to the database?
With PHP, polling the DB is the typical way to do this. You could also use TCP/IP sockets to connect to some kind of application server, that sits in front of your database, and knows about all writers and all consumers. I.e. when a write comes in, it both broadcasts it to all consumers and writes it to the DB. The consumers in that examples are the PHP scripts (one per SSE client).
If you use WebSockets, then you need exactly the same architecture, because PHP is single-threaded: each SSE connection is an independent PHP process.
If you switch to using, say, node.js, then that application server can be built-in. (Again, it would work the same way, whether SSE or WebSockets.)
But, you mention you intend to use shared hosting. SSE (and WebSockets, and comet technologies) hold a socket open, which interferes with the economics of shared hosting. So your sockets are likely to get closed regularly. My advice would be to stick with ajax (and therefore DB) polling every 5 seconds, instead of SSE, until your application is worth enough that the $10-$100/month for a real host is not an issue. Then consider using SSE to optimize the latency.
P.S. The decision between SSE and WebSockets is all about write frequency. My guideline is if your clients write data, on average, once a second or more frequently, web sockets are better, because it is keeping the write channel open. If once every 5+ seconds then web sockets does not bring much, compared to just using an Ajax post each time you have data to write. An SSE back-end is simpler to deal with than a WebSockets back-end. (Writes every 1-5 seconds is the grey area.)
What I would recommend is instead of polling the database for changes, you will know when there is going to be a database change because your application will be making that change. I would use web sockets (https://developer.mozilla.org/en-US/docs/WebSockets) and simply push an update to all active clients when any member makes a change.
Here is the difference between Server Send Events and Web Sockets. (In your case Web Sockets are the way to go)
Websockets and SSE (Server Sent Events) are both capable of pushing data to browsers, however they are not competing technologies.
Websockets connections can both send data to the browser and receive data from the browser. A good example of an application that could use websockets is a chat application.
SSE connections can only push data to the browser. Online stock quotes, or twitters updating timeline or feed are good examples of an application that could benefit from SSE.
In practice since everything that can be done with SSE can also be done with Websockets, Websockets is getting a lot more attention and love, and many more browsers support Websockets than SSE.
However, it can be overkill for some types of application, and the backend could be easier to implement with a protocol such as SSE.
I am writing Web chat where you have several one-on-one conversations with people on the screen at the same time. (Basically, like a personal messenger, without group chats).
My technology options seem to be Long Polling and WebSockets, and I'm trying to choose.
The upside with Long Polling is that's it's very easy to implement, and I can return whatever data i want (a customized JSON-object with the data required to update the page).
What I'm afraid of with WebSockets is that there's no native library for it in PHP, so you have to shop between different 3rd party ones, and the concepts seem more complicated, what with channels and subscriptions and what have you.
Browser compatibility is not an issue for me.
Is the performance of Long Polling much poorer than with Websockets? If no, then my decision is easy!
Is there a really simple Websocket server for PHP? Or is the concept so simple I could write my own? (Mozilla has a really simple tutorial on writing a client, but not on a server).
Assuming that your long-polling scheme involves an endpoint hosted by the same web server as your frontend, this will mean two active connections for every user of the application, so you will basically cut the number of users you can support in half. Your websocket server would run on a different port and can bypass your web server, so the connections are a lot of saved overhead with websockets.
Another place websockets save on overhead is that once your connection is established, there is no need for constant requests and responses. Zombie websocket connections are essentially free in terms of both bandwidth and CPU.
Finally, I would not think that long polling would be simpler to implement. Since websockets are designed to do exactly what you want, I think that leveraging an existing websocket package will actually save you some lines of code. I would look at Ratchet (feature-rich) or phpwebsocket (lite), if you want to use PHP.
Long Polling is definitely way much poorer than Werbsockets.
It is not recommended to use whatever websockets library with PHP, specially for chat applications.
I suggest using Python, Ruby or Node.js instead.
I've been reading a few posts on here regarding polling and even had a look at Pusher although i don't want to go down that route and need some advice in regards of making an efficent notification system. How do facebook, twitter and other websites do this? are they using web sockets?
Polling
> Polling data from server - most
> efficient and practical way/setup
You should avoid polling because it is not efficient and also not real-time at all. Let's say you poll every 30 seconds and your server can handle the load without any problems. The problem then is that your data is not real-time. You could shorten the poll-interval (every second), but then you will have a very difficult time trying to scale your server.
But sometimes polling (smart) is also very nice, because it is easy to implement. Some tips I have for you are:
Don't use the database, but retrieve data from in-memory database like redis, memcached because they are much faster. That is the secret ingredient for most popular big players(websites) to run smoothly. Facebook has special purpose servers that use a lot of memory using memcached (5 TB in 2008 => Facebook has grown a lot since ;)).
If you can't install (probably should!) Memcached or Redis on your server you could consider using the hosted http://redistogo.com which is free for small sites.
The other thing to do is increment the poll interval using github's library to prevent server overloading.
> How do facebook, twitter and other
> websites do this? are they using web sockets?
Web-sockets
Some of these sites are using websockets, but that is only one of the many transports that they support because websockets aren't available in all browsers. In the future when all browsers support websockets that will be the only transport used (probably). Below I will give you a list of all the popular transports with a quick description:
websockets:
WebSocket is a technology providing
for bi-directional, full-duplex
communications channels, over a single
Transmission Control Protocol (TCP)
socket. It is designed to be
implemented in web browsers and web
servers, but it can be used by any
client or server application.
For the client side, WebSocket was to
be implemented in Firefox 4, Google
Chrome 4, Opera 11, and Safari 5, as
well as the mobile version of Safari
in iOS 4.2. However, although
present, support is now disabled by
default in Firefox and Opera because
of concerns over security
vulnerabilities.
xhr long-polling:
For the most part, XMLHttpRequest long
polling works like any standard use of
XHR. The browser makes an asynchronous
request of the server, which may wait
for data to be available before
responding.
This transport is available in every browser.
json-p long-polling:
A long-polling Comet transport can be
created by dynamically creating script
elements, and setting their source to
the location of the Comet server,
which then sends back JavaScript (or
JSONP) with some event as its payload.
Each time the script request is
completed, the browser opens a new
one, just as in the XHR long polling
case. This method has the advantage of
being cross-browser while still
allowing cross-domain implementations.
htmlfile:
provide a usable streaming transport
in Internet Explorer
flashsocket
XMLSocket is a class in ActionScript
which allows Adobe Flash content to
use socket communication, via TCP
stream sockets. It can be used for
plain text, although, as the name
implies, it was made for XML. It is
often used in chat applications and
multiplayer games.
Facebook
As you probably know Facebook does use PHP for there active development, but actually don't use it for any of there real-time elements on there site, because PHP is not designed to handle this properly (yet?). A lot of people get mad at me for saying this, but I can't help that it is the truth (even Facebook agrees). In PHP almost all function calls (C) are using blocking I/O which makes scaling real-time systems almost impossible. I read a blog post over here using non-blocking IO with PHP (quality?). In the past Facebook created the chat using Erlang which is also pretty popular for doing non-blocking IO. I myself find the Erlang code looking strange, but I still would like to learn it. Here are some links about Facebook using Erlang:
https://www.facebook.com/note.php?note_id=91351698919
https://www.facebook.com/note.php?note_id=51412338919
http://www.infoq.com/news/2008/05/facebookchatarchitecture
Also Facebook bought Friendfeed in the past and open-sourced there Tornado framework which is written in Python to do non-blocking IO.
It is no longer just the traditional
Linux, Apache, MySQL, and PHP stack
that make a site like Facebook or
FriendFeed possible, but new
infrastructure tools like Tornado,
Cassandra, Hive (built on top of
Hadoop), memcache, Scribe, Thrift, and
others are essential. We believe in
releasing generically useful
infrastructure components as open
source (see Facebook Open Source) as a
way to increase innovation across the
Web.
I assume they are also using tornado for some parts of there system now.
Other sites
Below I will try to list some popular frameworks(open-source) to do non-blocking IO:
Socket.io (Node.js): Socket.io is becoming a pretty popular non-blocking framework for node.js (1722 people are watching this project on Github right now). I really like Socket.io
Netty (Java): The Netty project is an effort to provide an asynchronous event-driven network application framework and tools for rapid development of maintainable high performance & high scalability protocol servers & clients.
tornado (Python): Tornado is an open source version of the scalable, non-blocking web server and tools that power FriendFeed.
> even had a look at Pusher although I
> don't want to go down that route
I don't understand your dislike for pusher because it is a pretty popular hosted solution. With this hosted solution you could started building scalable real-time service without any hassle with pretty good pricing (free if small and pretty affordable for middle range websites).
I was researching about websockets and came to know about 'Kaazing WebSocket Gateway'. Kaazing WebSocket Gateway provides complete WebSocket emulation for all the older browsers (I.E. 5.5+, Firefox 1.5+, Safari 3.0+, and Opera 9.5+), so you can start using the HTML5 WebSocket APIs today.
Please see this link
I'd like to develop a near real time web based chat system. Any suggestions on how to implement this via jQuery, any gotchas to look out for, and what is this Comet thing I keep reading about?
Ideally, I'd like to support up to about 5,000 concurrent chatters.
Comet, also known as Ajax Push, is often refered as "Reverse AJAX". Instead of pulling the information from the server in regular intervals, the data is pushed from the server to the browser when it is needed. That requires an open connection, for which there are several implementations.
I recommend that you use APE. Here is a demo: http://www.ape-project.org/demos/1/ape-real-time-chat.html
Advantage: It will be very responsive
and real-time.
Disadvantage: You need
to setup the APE server on your
webserver machine.
Comet is a "push" tecnology, created to avoiding the client (javascript code) to continously poll the server. This could cause bandwith problem, because you have to create (maybe) a new TCP connection, then contact the http server, he runs some server-side logic and then sends a response to the client. With comet, if the server decide that you should recive some information (e.g., new chat message) he directly send it to the client.
There are several different implementation, you can have a start here.
the simplest implementation tecnique is the hidden iframe, but I'd raccomend the long polling wich is much more controllable.
One more thing, thake a look at HTML5 websokets, wich could be an interesting solution to your problem (not very compatible with current browser, anyway)
Check out Node.js and nowjs for node.js. Node.js helps you build very efficient servers using server side JavaScript and nowjs is a library that allows you to build real time web apps. There is even a example screen cast that puts together a basic chat application in 12 lines of code.
You could also checkout Socket.io which is another node library thats helps you build real time apps by abstracting away different transport mechanisms and giving you a unified interface to code against (supports WebSockets, Flash Sockets, AJAX long polling, JSONP Polling and Forever IFrames).
I realize you tagged your question PHP but if you are seriously considering writing a scalable system with the least amount of effort (relatively speaking) then learning Node.js is worth your time (and the learning curve is not thats steep since you probably already know JS).