How to monitor page for changes without polling?

How to monitor page for changes without polling? - php

I currently have an IRC bot written in C++ which monitors a page written in php for changes and then outputs these changes to the IRC channel.
However the current method is rather in-effective as it just constantly polls the page once every 10 seconds and compares it to the last seen version to check if anything has changed.
I can decrease the page check interval to about 2-3 seconds before the IRC bot starts to take a performance hit, however this isn't ideal.
Often the page I am monitoring can change multiple times within the 10 second period, so a change could be missed, what would be a better method to get the data from the page? considering I control both the page written in PHP, and the IRC bot, but they are on different servers.
The sole purpose of this page is to pass data to the IRC bot, so it could be completely re-implemented as something else if that would be a better solution; the IRC bot also monitors multiple versions of this page to check for different things.

If the data generated by PHP isn't somehow pushed on a stream (broadcast or feed), you don't have any other choice than polling the page, unfortunately.
What you could do is push the data from PHP using broadcast, or make a persistent connection from the bot to the PHP script, or make the PHP calculate the differences itself.

The PHP script should send a message to a public port or path that your IRB bot listens on, containing information about any posts made. This way, you are notified only when a message arrives.
One note about doing these sorts of things, beware if there are a lot of posts within a short period; if concurrency is important, you'll want to implement this using a proper MQ service like 0MQ/RabbitMQ/InsertMQFrameworkNameHere to ensure the messages arrive in order and are guaranteed sending and receiving.

If you need to monitor every change, then have your PHP page "push" data to your bot rather than your IRC bot "pull" data from the page (through polling). This can be done over any network socket, even something like a HTTP POST request from your PHP page to your bot over port 80.

A good alternative to polling is Comet. Here are examples (for JavaScript though): http://www.zeitoun.net/articles/comet_and_php/start.

I would suggest this approach:
when you retrieve your page, specify a very long timeout, say 10 minutes (bear with me for a moment);
if you have a new page, let the server return it; otherwise just don't send a reply
if there is no page, the client will wait for up to 10 minutes before giving up (timing out); but, if during this time a new page is there, your server can reply to the request and pass the page to the client;
in case the timeout fires, you simply send another request with the same long timeout.
Hope I could explain it clearly. The only tricky point is how your web page (PHP) can hold the wait when a request arrives if there is no new data to send back.
This can be easily accomplished like this:
if ($newDataAvailable) {
file_put_contents($data, $request_uri);
return;
}
while (!$newDataAvailable) {
usleep(10000);
$newDataAvailable = <check_for_data>;
}
//-- here data is available
<build response using get_file_contents($uri)>
<send response>

Related

Should I use ajax for updating messages in a chat-client?

I am writing a simple chat-client (completely intended for learning purposes). My android phone sends messages to a remote MySQL database, and I am in the process of getting the browser to display any new incoming messages.
My current approach is using javascript: it calls a function every 5 seconds, which in turn calls a php that queries for new messages and sends it back to the browser.
I have no experience in ajax, but I've heard it is good when data has to change in a webpage constantly without having to refresh the page, which fits my situation.
My question is, does this sound like something I should use ajax for?

Yes, ajax is the way to go. However, what you have suggested (checking for messages every 5 seconds) generates a lot of requests and bandwidth. You should look into comet, which is still ajax but uses it in a different way.
Comet essentially is this:The client sends a request to the server. The php file on the server has a loop checking every few seconds for a message. When the server finds a message, it echos the message, but it doesn't close the connection. When another message arrives, it echos it again, but doesn't close the connection. This allows it to only need 1 request instead of hundreds. See http://www.zeitoun.net/articles/comet_and_php/start

I'll advice you to go for either ajax or websockets... if you're going for websocket, learn node.js... it has a lot of cool feature as platform built on Google Javascript V8 engine

Communicate from one page to another? JavaScript, PHP?

I'm building a prototype where on one page A I have a controller. Imagine a TV remote. On another page B, independent of form A -- it can be a different screen/computer -- with some elements X, Y, and Z that are going to be animated by the remote on page A.
My first idea would be:
Remote on page A saves an action wanted and sends JSON.
I create a listener on page B that reads the JSON to listen to what action to trigger.
I wonder if this the right direction.
It is just for a prototype so it doesn't have to be a production-perfect idea.

You can use web sockets for this.

Let's assume you're using 2 computers, both pointed at the website, as this technique could work in both scenarios.
If it's just a prototype, you could just simply have your page B poll the server every 5 seconds to look for updates that were submitted by page A.
However, in reality, for a production app with thousands of users, this could consume lots of bandwidth and put a heavy load on your server. To compensate for the load and bandwidth usage, you could increase the polling rate to 10 seconds, or 30 seconds, but in response to this change your users would experience delays while they wait for the browser to request an update from the server.
In a production app, many developers are turning to Comet as a solution. Comet is basically a term given by Alex Russell for a technique that involves using the request/response cycle to simulate server-push.
The general idea is that the browser makes a request to the server, and the server holds that connection open indefinitely. The connection remains open until another user posts an update to the server, in which case, the server then sends a response to the connected users.
The Dojo and Jetty team have demonstrations where they show that this technique can scale to 20,000 connections, when using continuations.
While I think you can carry out your experiment just fine with a database and/or some session variables, if your want to learn more about Comet on PHP, check out How to Implement Comet with PHP. Good luck!
UPDATE:
I also wanted to say that you definitely have the right idea for how to conceptually think about your message passing with JSON:
I create a listener on page B that reads the JSON to listen to what action to trigger.
I really like how you are thinking about passing a message that then tells the page what action to trigger. If you think about it, you could reuse your message passing concept to call other commands so that you avoid reinventing the wheel when a new command comes along that you need to call. Regardless of whether you poll, use Comet, or use WebSockets, it's a great idea to think about abstractions and generic, reusable data transports.

You could do this either with polling (having page B constantly poll for updates from the server) or use a server push technology like server sent events or websockets.

Yes, that would work. You could also just make it the same way you would make a vector line animation. Send the "commands" for movement to a server and record them (in a database, file, whatever) the client program can then request and redraw the movement smoothly any time and anywhere.

Using a cron job to execute Page B for every x unit time will make you check for any latest updated json (queried/returned output according your logic) from Page A. This way, you can use new updated json from page A and do your further task...

PHP infinitive loop or jQuery setInterval?

Js:
<script>
function cometConnect(){
$.ajax({
cache:false,
type:"post",
data:'ts='+1,
url: 'Controller/chatting',
async: true,
success: function (arr1) {
$(".page-header").append(arr1);
},
complete:function(){
cometConnect(true);
nerr=false;
},
dataType: "text"
});
}
cometConnect();
</script>
Php:
public function chatting()
{
while(true)
{
if(memcache_get(new_message))
return new_message;
sleep(0.5);
}
}
Is this a better solution than setting setInterval which connects to the PHP method which returns message if there is any every 1 second (1 sec increases +0.25 every 5 seconds let's say)?
If I used first solution, I could probably use sleep(0.5) it would give me messages instantly, because php loop is cheap, isn't?
So, what solution is better (more importantly, which takes less resources?). Because there are going to be hundreds of chats like this.
Plus, can first solution cause problems? Let's say I would reload a page or I would stop execution every 30 secs so I wouldn't get 502 Bad Gateway.
EDIT: I believe the second solution is better, so I am going to reimplement my site, but I am just curious if this can cause problems to the user or not? Can something not expected happen?
First problem I noticed is that you can't go to other page until there is at least one new message.

A chat is a one to many communication, while each one of the many can send messages and will receive messages from everybody else.
These two actions (sending, receiving) happen continuously. So this looks like an endless loop whereas the user can enter into (join the chat) and exit (leave the chat).
enter
send message
receive message
exit
So the loop looks like this (pseudo-code) on the client side:
while (userInChat)
{
if (userEnteredMessages)
{
userSendMessages(userEnteredMessages)
}
if (chatNewMessages)
{
displayMessages(chatNewMessages)
}
}
As you already note in your question, the problem is in implementing such a kind of chat for a website.
To implement such a "loop" for a website, you are first of all facing the situation that you don't want to have an actual loop here. As long as the user is in chat, it would run and run and run. So you want to distribute the execution of the loop over time.
To do this, you can convert it into a collection of event functions:
ChatClient
{
function onEnter()
{
}
function onUserInput(messages)
{
sendMessages = send(messages)
display(sendMessages)
}
function onReceive(messages)
{
display(messages)
}
function onExit()
{
}
}
It's now possible to trigger events instead of having a loop. Only left is the implementation to trigger these events over time, but for the moment this is not really interesting because it would be dependent to how the chat data exchange is actually implemented.
There always is a remote point where a chat client is (somehow) connected to to send it's own messages and to receive new messages from.
This is some sort of a stream of chat messages. Again this looks like a loop, but infact it's a stream. Like in the chat clients loop, at some point in time it hooks onto the stream and will send input (write) and receive output (read) from that stream.
This is already visible in the ChatClient pseudo code above, there is an event when the user inputs one or multiple messages which then will be send (written). And read messages will be available in the onReceive event function.
As the stream is data in order, there needs to be order. As this is all event based and multiple clients are available, this needs some dedicated handling. As order is relative, it will only work in it's context. The context could be the time (one message came before another message), but if the chat client has another clock as the server or another client, we can't use the existing clock as time-source for the order of messages, as it normally differs between computers in a WAN.
Instead you create your own time to line-up all messages. With a shared time across all clients and servers an ordered stream can be implemented. This can be easily done by just numbering the messages in a central place. Luckily your chat has a central place, the server.
The message stream starts with the first message and ends with the last one. So what you simply do is to give the first message the number 1 and then each new message will get the next higher number. Let's call it the message ID.
So still regardless which server technology you'll be using, the chat knows to type of messages: Messages with an ID and messages without an ID. This also represents the status of a message: either not part or part of the stream.
Not stream associated messages are those that the user has already entered but which have not been send to the server already. While the server receives the "free" messages, it can put them into the stream by assigning the ID:
function onUserInput(messages)
{
sendMessages = send(messages)
display(sendMessages)
}
As this pseudo code example shows, this is what is happening here. The onUserInput event get's messages that are not part of the stream yet. The sendMessages routine will return their streamed representation which are then displayed.
The display routine then is able to display messages in their stream order.
So still regardless how the client/server communication is implemented, with such a structure you can actually roughly handle a message based chat system and de-couple it from underlying technologies.
The only thing the server needs to do is to take the messages, gives each message an ID and return these IDs. The assignment of the ID is normally done when the server stores the messages into it's database. A good database takes care to number messages properly, so there is not much to do.
The other interaction is to read new messages from the server. To do this over network effectively, the client tells the server from which message on it likes to read from. The server will then pass the messages since that time (ID) to the client.
As this shows, from the "endless" loop in the beginning it's now turned into an event based system with remote calls. As remote calls are expensive, it is better to make them able to transfer much data with one connection. Part of that is already in the pseudo code as it's possible to send one or multiple messages to the server and to receive zero or more messages from the server at once.
The ideal implementation would be to have one connection to the server that allows to read and write messages to it in full-duplex. However no such technology exists yet in javascript. These things are under development with Websockets and Webstream APIs and the like but for the moment let's take things simple and look what we have: stateless HTTP requests, some PHP on the server and a MySQL database.
The message stream can be represented in a database table that has an auto-incrementing unique key for the ID and other fields to store the message.
The write transaction script will just connect to the database, insert the message(s) and return the IDs. That's a very common operation and it should be fast (mysql has a sort of memcache bridge which should make the store operation even more fast and convenient).
The read transaction script is equally simple, it will just read all messages with an ID higher than passed to it and return it to the client.
Keep these scripts as simple as possible and optimize the read/write time to the store, so they can execute fast and you're done even with chatting over plain HTTP.
Still your webserver and the overall internet connection might not be fast enough (although there is keep-alive).
However, HTTP should be good enough for the moment to test if you chat system is actually working without any loops, not client, nor server side.
It's also good to keep servers dead simple, because each client relies on them, so they should just do their work and that's it.
You can at any time change the server (or offer different type of servers) that can interact with your chat client by giving the chat client different implementations of the send and receive functions. E.g. I see in your question that you're using comet, this should work as well, it's probably easy to directly implement the server for comet.
If in the future websockets are more accessible (which might never be the case because of security considerations), you can offer another type of server for websockets as well. As long as the data-structure of the stream is intact, this will work with different type of servers next to each other. The database will take care of the congruency.
Hope this is helpful.
Just as an additional note: HTML5 offers something called Stream Updates with Server-Sent Events with an online demo and PHP/JS sources. The HTML 5 feature offers already an event object in javascript which could be used to create an exemplary chat client transport implementation.

I wrote a blog post about how I had to handle a similar problem (using node.js, but the principles apply).
http://j-query.blogspot.com/2011/11/strategies-for-scaling-real-time-web.html
My suggestion is, if it's going to be big either a) you need to cache like crazy on your web server layer, which probably means your AJAX call needs to have a timestamp on it or b) use something like socket.io, which is built for scaling real-time web apps and has built-in support for channels.

Infinite loops in php can and will use 100% of your CPU. Sleep functions will fix that problem. However, you probably don't want to have a separate HTTP process running all the time for every client that is connected to your server because you'll run out of connections. You could just have one php process that looks at all inbound messages and routes them to the right person as they come in. This process could be launched from a cron job once a minute. I've written this type of thing many times and it works like a charm. Note: Make sure you don't run the process if it's already running or you will run into multiprocessing problems (like getting double messages). In other words, you need to make the process thread safe.
If you want to get real time chatting, then you might want to take a look at StreamHub which opens a full duplex connection to the client's browser.

It's not a PHP or jQuery task now. Node.js!
There is socket.io, which means WebSockets.
I'll explain why node.js is better. I have a task to refresh on-page markers every, for example, 10 seconds. I've done it with the first method. When the persistent users count come to 200. Http server and php were in trouble. There were a lot of requests which was unnesessary.
Whats give you Node.js:
Creating separate rooms for chats (here)
Sends data, only for those who has updates (for example, if I do not have any new message my refresh will be blocked when there will be selection from database)
You run 1 query to the DB per 0.5 second, no matter how much users there are
Just look into Node.js and Socket.io. This solution help me with a great boost.

First off, ask yourself if it's necessary to update the chat frequently. What type of chats will be happening? Is it real-time? Simple Q&A? Tech support? Etc. In all but the real-time chat cases, you will be better off using a long polling JS-based design, because instantaneous responses are not that important. If this is for real-time chats, then you should consider a Gmail-like design whereby you keep an XHR open and push messages back to the client as they are received. If connection resources are a concern, you can get by using long polling with a very brief interval (ex. 5-10 seconds).

How does Gmail's periodic mail fetching work?

When I am using gmail, some new mails that I just received appear on the inbox, even if I did not refresh the page. I believe that would done in Ajax.
Is there any good demo that works very similar to it? Periodically checking and getting JSON data (I am not sure if it is JSON data..) to fetch new data??
Thanks!

The problem with periodic refresh is that while it is good for some things that aren't too time critical, like e-mail fetching, it is not instantaneous. Therefore you wouldn't want to use it for something like chat where waiting even five seconds for a response is too long. You could decrease the polling interval and make a request once a second or even half second, but then you'd quickly overload your browser and waste resources.
One solution for this is to use a technique called ajax long polling (known by others as 'comet' or 'reverse ajax'). With this technique, the browser makes a long duration ajax request, one that does not return until there is new data available. That request sits on the server (you need to be running special server side software to handle this in a scalable manner, but you could hack something together with php as a demonstration) until new data is available at which point it returns to the client with the new data. When the client receives the data it makes another long polling request to sit at the server until there is more data. This is the method that gmail uses, I believe.
That is the gist of long polling, you have to make a couple of modifications because most browsers will expire an ajax request if it does not return after a long time, so if the ajax request times out the client has to make another request (but the timeout is usually on the interval of a minute or longer). But this is the main idea.
The server side implementation of this is far more complicated than the client side (client side requires just a few lines of js).

While I'm not sure of the exact gmail implemention, the AjaxPatterns site has a good overview of what they dub a Periodic Refresh: --> http://ajaxpatterns.org/Periodic_Refresh. I've always just referred to the style as a heartbeat.
The gist of their solution is:
The browser periodically issues an XMLHttpRequest Call to gain new information, e.g. one call every five seconds. The solution makes use of the browser's Event Scheduling capabilities to provide a means of keeping the user informed of latest changes.
They include some links to real-world examples and some sample code.

How to make fast, lightweight, economic online chat with PHP + JS + (MySQL?) + (AJAX?)

What way will be best to write online chat with js? If i would use AJAX and update information about users and messages every 5sec - HTTP requests and answers will make big traffic and requests will make high server load.
But how another? Sockets? But how..

You seem to have a problem with the server load, so I'll compare the relevant technologies.
Ajax polling:
This is the most straightforward. You do setTimeout loop every 5 seconds or so often to check for new chat messages or you set an iframe to reload. When you post a message, you also return new messages, and things shouldn't get out of order. The biggest drawback with this method is that you're unlikely to poll with a frequency corresponding to how often messages are posted. Either you'll poll too quickly, and you'll make a lot of extra requests, or you'll poll too slowly and you'll get chunks of messages at a time instead of getting them in a real-time-ish way. This is by far the easiest method though.
HTTP Push
This is the idea that the server should tell the client when there are new messages, rather than the client continually bothering the server asking if there are any new ones yet. Imagine the parent driving and kid asking "are we there yet?", you can just have the parent tell the kid when they get there.
There are a couple ways to both fake this and do it for real. WebSockets, which you mentioned, are actually creating a stream between the client and the server and sending data in real time. This is awesome, and for the 4 out of 10 users that have a browser that can do it, they'll be pretty psyched. Everyone else will have a broken page. Sorry. Maybe in a couple years.
You can also fake push tech with things like long-polling. The idea is that you ask the server if there are any new messages, and the server doesn't answer until a new message has appeared or some preset limit (30 seconds or so) has been reached. This keeps the number of requests to a minimum, while using known web technologies, so most browsers will work with it. You'll have a high connection concurrency, but they really aren't doing anything, so it should have too high a server cost.
I've used all of these before, but ended up going with long polling myself. You can find out more about how to actually do it here: How do I implement basic "Long Polling"?

You should chose sockets rather than AJAX Polling, However there isn't much around about how you can integrate socket based chats with MySQL.
I have done a few tests and have a basic example working here: https://github.com/andrefigueira/PHP-MySQL-Sockets-Chat
It makes use of Ratchet (http://socketo.me/) for the creation of the chat server in PHP.
And you can send chat messages to the DB by sending the server JSON with the information of who is chatting, (if of course you have user sessions)

There are a few ways that give the messages to the client immediately:
HTML5 Websockets
good because you can use them like real sockets
bad because only a few browsers support it
Endlessly-loading frame
good because every browser supports it
not so cool because you have to do AJAX requests for sending stuff
you can send commands to the client by embedding <script> tags in the content
the script gets executed immediately, too!
...
So, in conclusion, I would choose the second way.

The typical approach is to use long polling. Though better not do this in PHP (PHP will need one process for each connection thus drastically limiting the number of possible visitors to your site). Instead use node.js. It's perfect for chats.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.