Pushing data across browsers (same session)

Pushing data across browsers (same session) - php

I've seen pages like facebook where, if you post a message in your newsfeed, it automatically pushes that across your browsers. Or like on this page... if someone has answered a question while you are typing, a bar drops down.
Are they just calling AJAX requests every 30 seconds or whatever? It seems like that would be a resource drain on your server. Is there a way to push something at the browser instead?

There are 3 options here:
Use the new (experimental) browser API (sockets)
Long polling / comet
Using / listening to cookies
Long polling / comet example in PHP / AJAX
// PHP SIDE
$max_wait_time = 30; // at most, 30 seconds
$start_time = microtime(true);
while( $start_time - microtime(true) < $max_wait_time ){
// ...check if something changed (eg, run an SQL query or something)
if($something_changed){
echo 'something changed';
die;
}
// if the user did abort, terminate immediately
if( connection_aborted() ) die;
// sleep for one second. For faster responses, keep
// splitting this suitably (eg, 0.5 of a second...)
usleep(1000000);
}
// JS SIDE
var poll = function(){
jQuery.get('the url', function(){
poll();
});
}
poll();
Cookie example in PHP / JS (you need the jQuery cookie plugin)
<?php
// PHP SIDE
setcookie('test', mt_rand(0,100));
?><!-- HTML/JS SIDE -->
Rand!
Rand=<span><?php echo $_COOKIE['test']; ?></span>
<script type="text/javascript">
var oldrand = <?php echo $_COOKIE['test']; ?>;
setInterval(function(){
var newrand = jQuery.cookie('test');
if( newrand!=oldrand ){
jQuery('span').html(newrand);
oldrand = newrand;
}
}, 500);
</script>
The cookie one is pretty good for several reasons:
it is pretty fast (no AJAX calls)
it is less resource intensive on both client and server side
it consumes less bandwidth / network resources
it is much easier to control
In some cases where cookies cannot work, I'd still advocate the use of cookies as a signal to run an AJAX call, hence you wouldn't need to run a lot of AJAX calls just to wait for a change to happen.
On the other hand, the cookie one won't work when the change is happening by a third party, eg, it won't be suitable at all for chat systems.

Read into the differences between push and pull for more information:
In your example, the AJAX requests every 30 seconds would be a pull request - constantly asking the server if any updates are available, followed by a response.
You can set up a server/website to send push notifications to the client browser - whereby the client sits quietly, and the server sends the data/information to the client as soon as it is available (reducing network traffic etc.).
Push is much better in my opinion.

Yep, you'd have to poll with a looping Ajax script. To keep resource drain down, you might want to send some kind of hash (the timestamp of the last news item for instance) so the server knows if the client is up to date. This way, it can instantly return if there's no changes to push.

Related

Can multiple clients receive the same data (with AJAX) sent from one unique php, at the same time?

Lately I've been researching about technologies with which I could develop a simple chat using PHP and AJAX. Since I prefer not to install any software, I've decided to use SSE (Server Sent Events) instead of websockets.
This method allows the client who calls the php file to receive data without making a request, which is what I want, but the problem comes with the fact that every new client connecting executes the same php file again, instead of accessing the one that is already running and getting the same responses than every other client.
This means that if multiple clients calling the php file to receive data, the server will overload at some point. Knowing that, would it be possible to make every client receive the data coming only from one php executing file?
Here is what is going on:
This is okay, but as you can see, I end up having the same file executing two times, one for every client.
Here is what I want to do:
As you can see here, there is only one php file sending data, but multiple clients are receiving it.
How is this possible?
myFile.php
<?php
header('Content-Type: text/event-stream');
header('Cache-Control: no-cache');
$number = 0;
while(true){
echo "data: $number ";
echo "\n\n";
$number++;
sleep(1);
#ob_flush();
flush();
}
?>
client.html
<!DOCTYPE html>
<html>
<body>
<h1>Getting server updates</h1>
<div id="result"></div>
<script>
if(typeof(EventSource) !== "undefined") {
var source = new EventSource("myFile.php");
source.onmessage = function(event) {
document.getElementById("result").innerHTML += event.data + " ";
};
} else {
document.getElementById("result").innerHTML = "Your browser doesn´t support SSE (server-sent events)";
}
</script>
</body>
</html>
Thanks in advance.

Okay so I've finally solved the question. The answer is NO, multiple clients can't receive data from one single executing SSE script, since that would mean a huge security vulnerability for any client that uses it.
Apart from that, SSE is design to lock the session of the client who is opening the scrip until it ends or closes. This way, if some session parameter changes during the execution of the SSE, it won't read it. The only way to change session parameters during the execution, is by using the session_commit() function to commit those session changes.
This also means that if the session is locked, you the client won't be able to receive any other http response until the SSE script closes. Every response will stay on pending status until that happens.
This applies to ajax, whil also works with http asyn requests. At the time the SSE lock the session, the client will stop receiving ajax responses until the SSE script stops, or until the SSE script uses the session_commint() function.
This post helped me to understand how it works.

Are AJAX interval functions bad for servers?

I have this function set to run at intervals using JavaScript which points to a PHP page. I was just wondering if this is bad practice for the server load, especially at scale.
The PHP page just retrieves data from the Slack API through CURL and just echoes it out back.
setInterval(function(){
$.post('slackAPI.php?',function(data){
//append data received to some class
}, 'html');
}, 100);
This option works perfectly but I'm worried that it'll cause a heavy load on the server, is there a better option to retrieve data from an API in real time?

Your server load stems from two sources:
having to handle the incoming AJAX call,
having to issue a Slack call and get its return.
Depending on the scenario you can improve either of them (or even both):
Cache third-party calls
If a third-party information is likely to be updated no oftener than 'x', you can store the call's result into a database, memory keystore, or cache file, and read its contents if it's still "fresh".
Even better, you can use GET to retrieve the datum, and issue appropriate Expires header from the PHP side. This will prevent most AJAX libraries from issuing unnecessary calls. Of course, you now run the risk of not retrieving immediately a fresh information that came by in the meantime:
// 30 seconds timeout
Header('Expires: '.gmdate('D, d M Y H:i:s \\G\\M\\T', time() + 30));
Replace fixed interval with chained AJAX calls.
Instead of using a fixed interval, if you can whip up an API that waits for data to come, you can put the updating function as a callback of the updating function itself.
This way, you will issue a POST that will take, say, from 100 to 20000 milliseconds to complete. Before it completes, there is no data, so it would have been useless to issue 199 other calls to the same API in the meanwhile. When it completes, it immediately fires off a new AJAX call, which will wait again, and so on. This is better done with a jQuery Promise loop; writing it in straight Javascript would result in a recursive function, and there's a limit on how many recursions you can enter.
Another advantage is that you can control the client update frequency from the server.
You would do this using setTimeout, not setInterval.
function refresher() {
$.get('/your/api/endpoint', { maxtimeout: 30 })
.then(function(reply) {
if (reply.changes) {
// ...update the UI? Here you call your "old" function.
}
// Immediately issue another call, which will wait.
window.setTimeout(refresher, 1);
});
}
// Call immediately. The return callback will wait.
refresher();
The best option is the third:
Change API.
Most such poll services have the possibility of either issuing a long-blocking call (the one you would use above) or to register an URL which will receive the data when there's data to be fetched. Then you store the information and cache it, while the receiving endpoint on the server keeps the cache updated. You now only have to worry about the inbound calls, which are quickly handled.
The inbound call could then block and wait, consuming very few resources:
// $cacheFile is written by the callback API entry point
$ms = 200; // Granularity is 200 milliseconds
$seconds = 30; // Max linger time is 30 seconds (could be higher)
$delay = floor(($seconds * 1000.0) / $ms);
while ($delay--) {
// filemtime must read up-to-date information, NOT stale.
clearstatcache(false, $cacheFile);
if (filemtime($cacheFile) !== $lastCreationTime) {
break;
}
usleep($ms * 1000);
}
Header('Content-Type: application/json;charset=UTF-8');
readfile($cacheFile);
die();
The above will have a overhead of 150 stat() calls (30 seconds, 5 calls per second), which is absolutely negligible, and will save 149 web server calls and all related traffic and delays.

Efficient and user-friendly way to present slow-loading results

I have read many similar questions concerning cancelling a POST request with jQuery, but none seem to be close to mine.
I have your everyday form that has a PHP-page as an action:
<form action="results.php">
<input name="my-input" type="text">
<input type="submit" value="submit">
</form>
Processing results.php on the server-side, based on the post information given in the form, takes a long time (30 seconds or even more and we expect an increase because our search space will increase as well in the coming weeks). We are accessing a Basex server (version 7.9, not upgradable) that contains all the data. A user-generated XPath code is submitted in a form, and the action url then sends the XPath code to the Basex server which returns the results. From a usability perspective, I already show a "loading" screen so users at least know that the results are being generated:
$("form").submit(function() {
$("#overlay").show();
});
<div id="overlay"><p>Results are being generated</p></div>
However, I would also want to give users the option to press a button to cancel the request and cancel the request when a user closes the page. Note that in the former case (on button click) this also means that the user should stay on the same page, can edit their input, and immediately re-submit their request. It is paramount that when they cancel the request, they can also immediately resend it: the server should really abort, and not finish the query before being able to process a new query.
I figured something like this:
$("form").submit(function() {
$("#overlay").show();
});
$("#overlay button").click(abortRequest);
$(window).unload(abortRequest);
function abortRequest() {
// abort correct request
}
<div id="overlay">
<p>Results are being generated</p>
<button>Cancel</button>
</div>
But as you can see, I am not entirely sure how to fill in abortRequest to make sure the post request is aborted, and terminated, so that a new query can be sent. Please fill in the blanks! Or would I need to .preventDefault() the form submission and instead do an ajax() call from jQuery?
As I said I also want to stop the process server-side, and from what I read I need exit() for this. But how can I exit another PHP function? For example, let's say that in results.php I have a processing script and I need to exit that script, would I do something like this?
<?php
if (isset($_POST['my-input'])) {
$input = $_POST['my-input'];
function processData() {
// A lot of processing
}
processData()
}
if (isset($_POST['terminate'])) {
function terminateProcess() {
// exit processData()
}
}
and then do a new ajax request when I need to terminate the process?
$("#overlay button").click(abortRequest);
$(window).unload(abortRequest);
function abortRequest() {
$.ajax({
url: 'results.php',
data: {terminate: true},
type: 'post',
success: function() {alert("terminated");});
});
}
I did some more research and I found this answer. It mentions connection_aborted() and also session_write_close() and I'm not entirely sure which is useful for me. I do use SESSION variables, but I don't need to write away values when the process is cancelled (though I would like to keep the SESSION variables active).
Would this be the way? And if so, how do I make one PHP function terminate the other?
I have also read into Websockets and it seems something that could work, but I don't like the hassle of setting up a Websocket server as this would require me to contact our IT guy who requires extensive testing on new packages. I'd rather keep it to PHP and JS, without third party libraries other than jQuery.
Considering most comments and answers suggest that what I want is not possible, I am also interested to hear alternatives. The first thing that comes to mind is paged Ajax calls (similar to many web pages that serve search results, images, what-have-you in an infinite scroll). A user is served a page with the X first results (e.g. 20), and when they click a button "show next 20 results" those are shown are appended. This process can continue until all results are shown. Because it is useful for users to get all results, I will also provide a "download all results" option. This will then take very long as well, but at least users should be able to go through the first results on the page itself. (The download button should thus not disrupt the Ajax paged loads.) It's just an idea, but I hope it gives some of you some inspiration.

On my understanding the key points are:
You cannot cancel a specific request if a form is submitted. Reasons are on client side you don't have anything so that you can identify the states of a form request (if it is posted, if it is processing, etc.). So only way to cancel it is to reset the $_POST variables and/or refresh the page. So connection will be broken and the previous request will not be completed.
On your alternative solution when you are sending another Ajax call with {terminate: true} the result.php can stop processing with a simple die(). But as it will be an async call -- you cannot map it with the previous form submit. So this will not practically work.
Probable solution: submit the form with Ajax. With jQuery ajax you will have an xhr object which you can abort() upon window unload.
UPDATE (upon the comment):
A synchronous request is when your page will block (all user actions) until the result is ready. Pressing a submit button in the form - do a synchronous call to server by submitting the form - by definition [https://www.w3.org/TR/html-markup/button.submit.html].
Now when user has pressed submit button the connection from browser to server is synchronous - so it will not be hampered until the result is there. So when other calls to server is made - during the submit process is going on - no reference of this operation is available for others - as it is not finished. It is the reason why sending termination call with Ajax will not work.
Thirdly: for your case you can consider the following code example:
HTML:
<form action="results.php">
<input name="my-input" type="text">
<input id="resultMaker" type="button" value="submit">
</form>
<div id="overlay">
<p>Results are being generated</p>
<button>Cancel</button>
</div>
JQUERY:
<script type="text/javascript">
var jqXhr = '';
$('#resultMaker').on('click', function(){
$("#overlay").show();
jqXhr = $.ajax({
url: 'results.php',
data: $('form').serialize(),
type: 'post',
success: function() {
$("#overlay").hide();
});
});
});
var abortRequest = function(){
if (jqXhr != '') {
jqXhr.abort();
}
};
$("#overlay button").on('click', abortRequest);
window.addEventListener('unload', abortRequest);
</script>
This is example code - i just have used your code examples and changed something here and there.

Himel Nag Rana demonstrated how to cancel a pending Ajax request.
Several factors may interfere and delay subsequent requests, as I have discussed earlier in another post.
TL;DR: 1. it is very inconvenient to try to detect the request was cancelled from within the long-running task itself and 2. as a workaround you should close the session (session_write_close()) as early as possible in your long-running task so as to not block subsequent requests.
connection_aborted() cannot be used. This function is supposed to be called periodically during a long task (typically, inside a loop). Unfortunately there is just one single significant, atomic operation in your case: the query to the data back end.
If you applied the procedures advised by Himel Nag Rana and myself, you should now be able to cancel the Ajax request and immediately allow a new requests to proceed. The only concern that remains is that the previous (cancelled) request may keep running in the background for a while (not blocking the user, just wasting resources on the server).
The problem could be rephrased to "how to abort a specific process from the outside".
As Christian Bonato rightfully advised, here is a possible implementation. For the sake of the demonstration I will rely on Symphony's Process component, but you can devise a simpler custom solution if you prefer.
The basic approach is:
Spawn a new process to run the query, save the PID in session. Wait for it to complete, then return the result to the client
If the client aborts, it signals the server to just kill the process.
<?php // query.php
use Symfony\Component\Process\PhpProcess;
session_start();
if(isset($_SESSION['queryPID'])) {
// A query is already running for this session
// As this should never happen, you may want to raise an error instead
// of just silently killing the previous query.
posix_kill($_SESSION['queryPID'], SIGKILL);
unset($_SESSION['queryPID']);
}
$queryString = parseRequest($_POST);
$process = new PhpProcess(sprintf(
'<?php $result = runQuery(%s); echo fetchResult($result);',
$queryString
));
$process->start();
$_SESSION['queryPID'] = $process->getPid();
session_write_close();
$process->wait();
$result = $process->getOutput();
echo formatResponse($result);
?>
<?php // abort.php
session_start();
if(isset($_SESSION['queryPID'])) {
$pid = $_SESSION['queryPID'];
posix_kill($pid, SIGKILL);
unset($pid);
echo "Query $pid has been aborted";
} else {
// there is nothing to abort, send a HTTP error code
header($_SERVER['SERVER_PROTOCOL'] . ' 599 No pending query', true, 599);
}
?>
// javascript
function abortRequest(pendingXHRRequest) {
pendingXHRRequest.abort();
$.ajax({
url: 'abort.php',
success: function() { alert("terminated"); });
});
}
Spawning a process and keeping track of it is genuinely tricky, this is why I advised using existing modules. Integrating just one Symfony component should be relatively easy via Composer: first install Composer, then the Process component (composer require symfony/process).
A manual implementation could look like this (beware, this is untested, incomplete and possibly unstable, but I trust you will get the idea):
<?php // query.php
session_start();
$queryString = parseRequest($_POST); // $queryString should be escaped via escapeshellarg()
$processHandler = popen("/path/to/php-cli/php asyncQuery.php $queryString", 'r');
// fetch the first line of output, PID expected
$pid = fgets($processHandler);
$_SESSION['queryPID'] = $pid;
session_write_close();
// fetch the rest of the output
while($line = fgets($processHandler)) {
echo $line; // or save this line for further processing, e.g. through json_encode()
}
fclose($processHandler);
?>
<?php // asyncQuery.php
// echo the current PID
echo getmypid() . PHP_EOL;
// then execute the query and echo the result
$result = runQuery($argv[1]);
echo fetchResult($result);
?>

With BaseX 8.4, a new RESTXQ annotation %rest:single was introduced, which allows you to cancel a running server-side request: http://docs.basex.org/wiki/RESTXQ#Query_Execution. It should solve at least some of the challenges you described.
The current way to only return chunks of the result is to pass on the index to the first and last result in your result, and to do the filtering in XQuery:
$results[position() = $start to $end]
By returning one more result than requested, the client will know that there will be more results. This may be helpful, because computing the total result size is often much more expensive than returning only the first results.

I hope I understood this correctly.
Instead of letting the browser "natively" submit the FORM, don't: write JS code that does this instead. In other words (I didn't test this; so interpret as pseudo-code):
<form action="results.php" onsubmit="return false;">
<input name="my-input" type="text">
<input type="submit" value="submit">
</form>
So, now, when the that "submit" button is clicked, nothing will happen.
Obviously, you want your form POSTed, so write JS to attach a click handler on that submit button, collect values from all input fields in the form (actually, it is NOT nearly as scary as it sounds; check out the link below), and send it to the server, while saving the reference to the request (check the 2nd link below), so that you can abort it (and maybe signal the server to quit also) when the cancel-button is clicked (alternatively, you can simply abandon it, by not caring about the results).
Submit a form using jQuery
Abort Ajax requests using jQuery
Alternatively, to make that HTML markup "clearer" relative to its functionality, consider not using FORM tag at all: otherwise, what I suggested makes its usage confusing (why it is there if it's not used; know I mean?). But, don't get distracted with this suggestion until you make it work the way you want; it's optional and a topic for another day (it might even relate to your changing architecture of the whole site).
HOWEVER, a thing to think about: what to do if the form-post already reached the server and server already started processing it and some "world" changes have already been made? Maybe your get-results routine doesn't change data, so then that's fine. But, this approach probably cannot be used with change-data POSTs with the expectation that "world" won't change if cancel-button is clicked.
I hope that helps :)

The user doesn't have to experience this synchronously.
Client posts a request
The server receives the client request and assigns an ID to it
The server "kicks off" the search and responds with a zero-data page and search ID
The client receives the "placeholder" page and starts checking if the results are ready based on the ID (with something like polling or websockets)
Once the search has completed, the server responds with the results next time it's polled (or notifies the client directly when using websockets)
This is fine when performance isn't quite the bottleneck and the nature of processing makes longer wait times acceptable. Think flight search aggregators that routinely run for 30-90 seconds, or report generators that have to be scheduled and run for even longer!
You can make the experience less frustrating if you don't block user interactions, keep them updated of search progress and start showing results as they come in if possible.

You must solve this conceptually first before writing any code. Here are some things that come to mind offhand:
What does it mean to free up resources on the server?
What constitutes to a graceful abort that will free up resources?
Is it enough to kill the PHP process waiting for the query result(s)? If so, the route suggested by RandomSeed could be interesting. Just keep in mind that it will only work on a single server. If you have multiple load balanced servers you won't have a way to kill a process on another server (not as easily at least).
Or do you need to cancel the database request from the database itself? In that case the answer suggested by Christian Grün is of more interest.
Or is it that there is no graceful shutdown and you have to force everything to die? If so, this seems awfully hacky.
Not all clients are going to explicitly abort
Some clients are going to close the browser, but their last request won't come through; some clients will lose internet connection and leave the service hanging, etc. You are not guaranteed to get an "abort" request when a client disconnects or has gone away.
You have to decide whether to live with potentially unwanted behavior, or implement an additional active state tracking, e.g. client pinging server for keepalive.
Side notes
30 secs or greater query time is potentially long, is there a better tool for the job; so you won't have to solve this with a hack like this?
you are looking for features of a concurrent system, but you're not using a concurrent system; if you want concurrency use a better tool/environment for it, e.g. Erlang.

PHP: Longpolling & Comet related

Recently, I am going to make a instant-notification system for my website. I heard COMET is an essential in such cases.
I have been searching about PHP & Comet for a while already, however, the guides & articles I have found seems like just ajax requests in a loop. For example, there is a basic javascript code which gets the value from PHP file every 2 seconds and outputs to HTML. As far as I know, it should be COMET pushing new values to HTML, hence, the loop should be on server side, not client. Half of the articles in my native language was using setInterval() and contact PHP file every X seconds.
So, I have some questions to ask you.
Is there any guides or examples, which doesn't use any external framework like XAJAX/NOLOH that is easy to understand?
What is the performance difference between using COMET in server side, or requesting value from ajax.php every X seconds?
The timed requests I mentioned above can be called as COMET? (ex. Long Polling using jQuery and PHP)
Do I need any extensions to run COMET serverside? (My webhost is using Apache, I personally use Nginx)

You have to use a client-side script (AJAX), because the server has to be polled. The server cannot simply send messages to someone's browser without an open connection.
I'm not too familiar with HTML5 websockets, but I believe this allows you can have a persistent connection with the server, however HTML5 browsers aren't used widely to use this as a solution on a 'public' website.
How long polling works is that an asynchronous request is sent from the browser with a long time-out time (e.g. 30 seconds), when the request arrives at the server, it goes and checks for new messages, but when there are now messages to be displayed, instead of directly outputting the result, it goes into an infinite loop, polling the database e.g. every second (using sleep to postpone the queries), until a message has been found. When a message has been found it terminates the loop and outputs the result. If there have been no messages after 30 seconds, the script times out and sends back an empty request.
So the request can be sent back between 0 and 30 seconds. As soon as the request arrives in the browser, it is handled and a new 30 second request is sent.
As for your questions;
You will need a client-side framework for doing the polling
You cannot use Comet only on server-side. Using longpolling over normal polling (e.g. polling every second) is significant because you make much less server requests
To my understanding; yes
You can use any server-side language, as long as it can keep the connection open while querying for messages.
Also take a look at http://nodejs.org/

I don't know what exactly COMMET is mean. But for this purpose you have many solutions.
One, as you mentioned is long-polling by ajax. is simple. and not requeire new browsers only (HtML5).
One more option is "server-sent -event". It's require browser with HTML5 but it keep connection alive without polling:
client:
if (window.EventSource) {
window.onload = function() {
window.scrollTo(0,1);
setTimeout(
function() {
var source = new EventSource("events.php");
source.onmessage = function (event) {
document.body.innerHTML += event.data + "<br>";
};
}, 1000);
};
} else {
document.write("Please visit this page in a browser that supports EventSource to see the test");
}
server:
if ($_SERVER['HTTP_ACCEPT'] === 'text/event-stream') {
header('Content-Type: text/event-stream');
echo "data: This is the first event\n\n";
flush();
$i = 5;
while (--$i) {
sleep(1);
$time = date('r');
echo "data: The server time is: {$time}\n\n";
flush();
}
} else {
echo 'This demo is for use with an EventSource compatible browser.';
}
goodluck.

How do I implement basic "Long Polling"?

I can find lots of information on how Long Polling works (For example, this, and this), but no simple examples of how to implement this in code.
All I can find is cometd, which relies on the Dojo JS framework, and a fairly complex server system..
Basically, how would I use Apache to serve the requests, and how would I write a simple script (say, in PHP) which would "long-poll" the server for new messages?
The example doesn't have to be scaleable, secure or complete, it just needs to work!

It's simpler than I initially thought.. Basically you have a page that does nothing, until the data you want to send is available (say, a new message arrives).
Here is a really basic example, which sends a simple string after 2-10 seconds. 1 in 3 chance of returning an error 404 (to show error handling in the coming Javascript example)
msgsrv.php
<?php
if(rand(1,3) == 1){
/* Fake an error */
header("HTTP/1.0 404 Not Found");
die();
}
/* Send a string after a random number of seconds (2-10) */
sleep(rand(2,10));
echo("Hi! Have a random number: " . rand(1,10));
?>
Note: With a real site, running this on a regular web-server like Apache will quickly tie up all the "worker threads" and leave it unable to respond to other requests.. There are ways around this, but it is recommended to write a "long-poll server" in something like Python's twisted, which does not rely on one thread per request. cometD is an popular one (which is available in several languages), and Tornado is a new framework made specifically for such tasks (it was built for FriendFeed's long-polling code)... but as a simple example, Apache is more than adequate! This script could easily be written in any language (I chose Apache/PHP as they are very common, and I happened to be running them locally)
Then, in Javascript, you request the above file (msg_srv.php), and wait for a response. When you get one, you act upon the data. Then you request the file and wait again, act upon the data (and repeat)
What follows is an example of such a page.. When the page is loaded, it sends the initial request for the msgsrv.php file.. If it succeeds, we append the message to the #messages div, then after 1 second we call the waitForMsg function again, which triggers the wait.
The 1 second setTimeout() is a really basic rate-limiter, it works fine without this, but if msgsrv.php always returns instantly (with a syntax error, for example) - you flood the browser and it can quickly freeze up. This would better be done checking if the file contains a valid JSON response, and/or keeping a running total of requests-per-minute/second, and pausing appropriately.
If the page errors, it appends the error to the #messages div, waits 15 seconds and then tries again (identical to how we wait 1 second after each message)
The nice thing about this approach is it is very resilient. If the clients internet connection dies, it will timeout, then try and reconnect - this is inherent in how long polling works, no complicated error-handling is required
Anyway, the long_poller.htm code, using the jQuery framework:
<html>
<head>
<title>BargePoller</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.min.js" type="text/javascript" charset="utf-8"></script>
<style type="text/css" media="screen">
body{ background:#000;color:#fff;font-size:.9em; }
.msg{ background:#aaa;padding:.2em; border-bottom:1px #000 solid}
.old{ background-color:#246499;}
.new{ background-color:#3B9957;}
.error{ background-color:#992E36;}
</style>
<script type="text/javascript" charset="utf-8">
function addmsg(type, msg){
/* Simple helper to add a div.
type is the name of a CSS class (old/new/error).
msg is the contents of the div */
$("#messages").append(
"<div class='msg "+ type +"'>"+ msg +"</div>"
);
}
function waitForMsg(){
/* This requests the url "msgsrv.php"
When it complete (or errors)*/
$.ajax({
type: "GET",
url: "msgsrv.php",
async: true, /* If set to non-async, browser shows page as "Loading.."*/
cache: false,
timeout:50000, /* Timeout in ms */
success: function(data){ /* called when request to barge.php completes */
addmsg("new", data); /* Add response to a .msg div (with the "new" class)*/
setTimeout(
waitForMsg, /* Request next message */
1000 /* ..after 1 seconds */
);
},
error: function(XMLHttpRequest, textStatus, errorThrown){
addmsg("error", textStatus + " (" + errorThrown + ")");
setTimeout(
waitForMsg, /* Try again after.. */
15000); /* milliseconds (15seconds) */
}
});
};
$(document).ready(function(){
waitForMsg(); /* Start the inital request */
});
</script>
</head>
<body>
<div id="messages">
<div class="msg old">
BargePoll message requester!
</div>
</div>
</body>
</html>

I've got a really simple chat example as part of slosh.
Edit: (since everyone's pasting their code in here)
This is the complete JSON-based multi-user chat using long-polling and slosh. This is a demo of how to do the calls, so please ignore the XSS problems. Nobody should deploy this without sanitizing it first.
Notice that the client always has a connection to the server, and as soon as anyone sends a message, everyone should see it roughly instantly.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- Copyright (c) 2008 Dustin Sallings <dustin+html#spy.net> -->
<html lang="en">
<head>
<title>slosh chat</title>
<script type="text/javascript"
src="http://code.jquery.com/jquery-latest.js"></script>
<link title="Default" rel="stylesheet" media="screen" href="style.css" />
</head>
<body>
<h1>Welcome to Slosh Chat</h1>
<div id="messages">
<div>
<span class="from">First!:</span>
<span class="msg">Welcome to chat. Please don't hurt each other.</span>
</div>
</div>
<form method="post" action="#">
<div>Nick: <input id='from' type="text" name="from"/></div>
<div>Message:</div>
<div><textarea id='msg' name="msg"></textarea></div>
<div><input type="submit" value="Say it" id="submit"/></div>
</form>
<script type="text/javascript">
function gotData(json, st) {
var msgs=$('#messages');
$.each(json.res, function(idx, p) {
var from = p.from[0]
var msg = p.msg[0]
msgs.append("<div><span class='from'>" + from + ":</span>" +
" <span class='msg'>" + msg + "</span></div>");
});
// The jQuery wrapped msgs above does not work here.
var msgs=document.getElementById("messages");
msgs.scrollTop = msgs.scrollHeight;
}
function getNewComments() {
$.getJSON('/topics/chat.json', gotData);
}
$(document).ready(function() {
$(document).ajaxStop(getNewComments);
$("form").submit(function() {
$.post('/topics/chat', $('form').serialize());
return false;
});
getNewComments();
});
</script>
</body>
</html>

Tornado is designed for long-polling, and includes a very minimal (few hundred lines of Python) chat app in /examples/chatdemo , including server code and JS client code. It works like this:
Clients use JS to ask for an updates since (number of last message), server URLHandler receives these and adds a callback to respond to the client to a queue.
When the server gets a new message, the onmessage event fires, loops through the callbacks, and sends the messages.
The client-side JS receives the message, adds it to the page, then asks for updates since this new message ID.

I think the client looks like a normal asynchronous AJAX request, but you expect it to take a "long time" to come back.
The server then looks like this.
while (!hasNewData())
usleep(50);
outputNewData();
So, the AJAX request goes to the server, probably including a timestamp of when it was last update so that your hasNewData() knows what data you have already got.
The server then sits in a loop sleeping until new data is available. All the while, your AJAX request is still connected, just hanging there waiting for data.
Finally, when new data is available, the server gives it to your AJAX request and closes the connection.

Here are some classes I use for long-polling in C#. There are basically 6 classes (see below).
Controller: Processes actions required to create a valid response (db operations etc.)
Processor: Manages asynch communication with the web page (itself)
IAsynchProcessor: The service processes instances that implement this interface
Sevice: Processes request objects that implement IAsynchProcessor
Request: The IAsynchProcessor wrapper containing your response (object)
Response: Contains custom objects or fields

This is a nice 5-minute screencast on how to do long polling using PHP & jQuery:
http://screenr.com/SNH
Code is quite similar to dbr's example above.

Here is a simple long-polling example in PHP by Erik Dubbelboer using the Content-type: multipart/x-mixed-replace header:
<?
header('Content-type: multipart/x-mixed-replace; boundary=endofsection');
// Keep in mind that the empty line is important to separate the headers
// from the content.
echo 'Content-type: text/plain
After 5 seconds this will go away and a cat will appear...
--endofsection
';
flush(); // Don't forget to flush the content to the browser.
sleep(5);
echo 'Content-type: image/jpg
';
$stream = fopen('cat.jpg', 'rb');
fpassthru($stream);
fclose($stream);
echo '
--endofsection
';
And here is a demo:
http://dubbelboer.com/multipart.php

I used this to get to grips with Comet, I have also set up Comet using the Java Glassfish server and found lots of other examples by subscribing to cometdaily.com

Take a look at this blog post which has code for a simple chat app in Python/Django/gevent.

Below is a long polling solution I have developed for Inform8 Web. Basically you override the class and implement the loadData method. When the loadData returns a value or the operation times out it will print the result and return.
If the processing of your script may take longer than 30 seconds you may need to alter the set_time_limit() call to something longer.
Apache 2.0 license. Latest version on github
https://github.com/ryanhend/Inform8/blob/master/Inform8-web/src/config/lib/Inform8/longpoll/LongPoller.php
Ryan
abstract class LongPoller {
protected $sleepTime = 5;
protected $timeoutTime = 30;
function __construct() {
}
function setTimeout($timeout) {
$this->timeoutTime = $timeout;
}
function setSleep($sleep) {
$this->sleepTime = $sleepTime;
}
public function run() {
$data = NULL;
$timeout = 0;
set_time_limit($this->timeoutTime + $this->sleepTime + 15);
//Query database for data
while($data == NULL && $timeout < $this->timeoutTime) {
$data = $this->loadData();
if($data == NULL){
//No new orders, flush to notify php still alive
flush();
//Wait for new Messages
sleep($this->sleepTime);
$timeout += $this->sleepTime;
}else{
echo $data;
flush();
}
}
}
protected abstract function loadData();
}

This is one of the scenarios that PHP is a very bad choice for. As previously mentioned, you can tie up all of your Apache workers very quickly doing something like this. PHP is built for start, execute, stop. It's not built for start, wait...execute, stop. You'll bog down your server very quickly and find that you have incredible scaling problems.
That said, you can still do this with PHP and have it not kill your server using the nginx HttpPushStreamModule: http://wiki.nginx.org/HttpPushStreamModule
You setup nginx in front of Apache (or whatever else) and it will take care of holding open the concurrent connections. You just respond with payload by sending data to an internal address which you could do with a background job or just have the messages fired off to people that were waiting whenever the new requests come in. This keeps PHP processes from sitting open during long polling.
This is not exclusive to PHP and can be done using nginx with any backend language. The concurrent open connections load is equal to Node.js so the biggest perk is that it gets you out of NEEDING Node for something like this.
You see a lot of other people mentioning other language libraries for accomplishing long polling and that's with good reason. PHP is just not well built for this type of behavior naturally.

Thanks for the code, dbr. Just a small typo in long_poller.htm around the line
1000 /* ..after 1 seconds */
I think it should be
"1000"); /* ..after 1 seconds */
for it to work.
For those interested, I tried a Django equivalent. Start a new Django project, say lp for long polling:
django-admin.py startproject lp
Call the app msgsrv for message server:
python manage.py startapp msgsrv
Add the following lines to settings.py to have a templates directory:
import os.path
PROJECT_DIR = os.path.dirname(__file__)
TEMPLATE_DIRS = (
os.path.join(PROJECT_DIR, 'templates'),
)
Define your URL patterns in urls.py as such:
from django.views.generic.simple import direct_to_template
from lp.msgsrv.views import retmsg
urlpatterns = patterns('',
(r'^msgsrv\.php$', retmsg),
(r'^long_poller\.htm$', direct_to_template, {'template': 'long_poller.htm'}),
)
And msgsrv/views.py should look like:
from random import randint
from time import sleep
from django.http import HttpResponse, HttpResponseNotFound
def retmsg(request):
if randint(1,3) == 1:
return HttpResponseNotFound('<h1>Page not found</h1>')
else:
sleep(randint(2,10))
return HttpResponse('Hi! Have a random number: %s' % str(randint(1,10)))
Lastly, templates/long_poller.htm should be the same as above with typo corrected. Hope this helps.

Why not consider the web sockets instead of long polling? They are much efficient and easy to setup. However they are supported only in modern browsers. Here is a quick reference.

The WS-I group published something called "Reliable Secure Profile" that has a Glass Fish and .NET implementation that apparently inter-operate well.
With any luck there is a Javascript implementation out there as well.
There is also a Silverlight implementation that uses HTTP Duplex. You can connect javascript to the Silverlight object to get callbacks when a push occurs.
There are also commercial paid versions as well.

For a ASP.NET MVC implementation, look at SignalR which is available on NuGet.. note that the NuGet is often out of date from the Git source which gets very frequent commits.
Read more about SignalR on a blog on by Scott Hanselman

You can try icomet(https://github.com/ideawu/icomet), a C1000K C++ comet server built with libevent. icomet also provides a JavaScript library, it is easy to use as simple as
var comet = new iComet({
sign_url: 'http://' + app_host + '/sign?obj=' + obj,
sub_url: 'http://' + icomet_host + '/sub',
callback: function(msg){
// on server push
alert(msg.content);
}
});
icomet supports a wide range of Browsers and OSes, including Safari(iOS, Mac), IEs(Windows), Firefox, Chrome, etc.

Simplest NodeJS
const http = require('http');
const server = http.createServer((req, res) => {
SomeVeryLongAction(res);
});
server.on('clientError', (err, socket) => {
socket.end('HTTP/1.1 400 Bad Request\r\n\r\n');
});
server.listen(8000);
// the long running task - simplified to setTimeout here
// but can be async, wait from websocket service - whatever really
function SomeVeryLongAction(response) {
setTimeout(response.end, 10000);
}
Production wise scenario in Express for exmaple you would get response in the middleware. Do you what you need to do, can scope out all of the long polled methods to Map or something (that is visible to other flows), and invoke <Response> response.end() whenever you are ready. There is nothing special about long polled connections. Rest is just how you normally structure your application.
If you dont know what i mean by scoping out, this should give you idea
const http = require('http');
var responsesArray = [];
const server = http.createServer((req, res) => {
// not dealing with connection
// put it on stack (array in this case)
responsesArray.push(res);
// end this is where normal api flow ends
});
server.on('clientError', (err, socket) => {
socket.end('HTTP/1.1 400 Bad Request\r\n\r\n');
});
// and eventually when we are ready to resolve
// that if is there just to ensure you actually
// called endpoint before the timeout kicks in
function SomeVeryLongAction() {
if ( responsesArray.length ) {
let localResponse = responsesArray.shift();
localResponse.end();
}
}
// simulate some action out of endpoint flow
setTimeout(SomeVeryLongAction, 10000);
server.listen(8000);
As you see, you could really respond to all connections, one, do whatever you want. There is id for every request so you should be able to use map and access specific out of api call.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.