How to keep Node.js from splitting socket messages into smaller chunks

How to keep Node.js from splitting socket messages into smaller chunks - php

I've got a chat program which pushes JSON data from Apache/PHP to Node.js, via a TCP socket:
// Node.js (Javascript)
phpListener = net.createServer(function(stream)
{
stream.setEncoding("utf8");
stream.on("data", function(txt)
{
var json = JSON.parse(txt);
// do stuff with json
}
}
phpListener.listen("8887", 'localhost');
// Apache (PHP)
$sock = stream_socket_client("tcp://localhost:8887");
$written = fwrite($sock, $json_string);
fclose($sock);
The problem is, if the JSON string is large enough (over around 8k), the output message gets split into multiple chunks, and the JSON parser fails. PHP returns the $written value as the correct length of the string, but the data event handler fires twice or more.
Should I be attaching the function to a different event, or is there a way to cache text across event fires, in a way that won't succumb to race conditions under heavy load? Or some other solution I haven't thought of?
Thanks!

You should try using a buffer, to cache the data, as Node.js tends to split data in order to improve performance.
http://nodejs.org/api.html#buffers-2
you can buffer all your request, and then call the function with the data stored at it.

TCP sockets don't handle buffering for you. How could it? It doesn't know what application layer protocol you are using and therefore has no idea what a "message" is. It is up to you to design and implement another protocol on top of it and handle any necessary buffering.
But, Node.js does have a built in application layer protocol on top of TCP that does automatically handle the buffering for you: the http module. If you use the http module instead of the tcp module for this you won't need to worry about packet fragmentation and buffering.

Related

Detect first connect of persistent PHP stream (STREAM_CLIENT_PERSISTENT)

I want to use stream_socket_client with option STREAM_CLIENT_PERSISTENT for stateful protocol. So some data exchange should be done upon TCP connection only (e.g.authentication).
Can I distinguish whether stream returned by stream_socket_client is a 'new born' or it's reused one?
I'm talking about php-fpm, so using of some global variables to store stream state is not an option, and usage of PHP sessions is to complicated, I guess.
Thanks.

Meanwhile I use following workaround (yes, I know it's ugly...):
$chunk = stream_set_chunk_size($stream, 8193);
if ($chunk == 8193)
//it's existing connection, newborn has other value (8192 mostly)
return;
}

You can use ftell, if it returns more than 0, then it's a reused connection.

Http server post data to php-cgi using C++

I'm writing a simple http server and I'd like to use php-cgi to handle POST requests. I write the handlePOST as follows:
void KServer::handlePOST(int sockfd, string file, string pdata){
char* argvs[MAX_PARAMS];
argvs[0] = const_cast<char*>(CGI_PATH);
argvs[1] = const_cast<char*>(file.c_str());
stringstream ss;
ss<<pdata.length();
string clen, sname;
ss>>clen;
clen = "CONTENT_LENGTH=" + clen;
sname = "SCRIPT_FILENAME="+file;
char* env[] = {
"REQUEST_METHOD=POST",
"REDIRECT_STATUS=CGI",
const_cast<char*>(clen.c_str()),
const_cast<char*>(sname.c_str()),
"CONTENT_TYPE=application/x-www-form-urlencoded",
0
};
istringstream stream(pdata);
cin.rdbuf(stream.rdbuf());
execve(argvs[0], argvs, env);
...
I know that php-cgi get POST data from STDIN. And I put the POST data (var. pdata) into cin.rdbuf. However, the program fails to fetch data from STDIN when executing execve. But if I enter the string using console, the program can run correctly.

It is highly improbable that when your custom http server reads the socket, it always manages to read exactly the number of bytes in the header portion of the HTTP message before getting to this point; so that when it executes the external process, the body portion of the message, containing the POST data, is now waiting to be read, on its standard input.
More than likely, you are using the C++ I/O library, or maybe even the C stdio library, to read the standard input, or socket. Which means, of course, that the built-in input buffering, that's employed by iostreams or stdio, has already managed to swallow a good chunk of the POST data from the body of the HTTP message, that immediately follows its header. It's now waiting or you, to continue reading on your std::cin, stdin; or whatever your code is actually reading.
It therefore follows that, sadly, when you execute the external process, it will be dismayed to find that the POST data it expects has already been consumed, either in entirety, or in part, on its standard input. Instead, it's sitting right here, in your process, waiting for you to resume reading your std::cin, stdin, or whatever.
In other words, for this to work, you cannot just execute an external process, and wash your hands of the whole mess. You will need to set up a pipe to the external process's standard input, and it is your responsibility to proceed to meticulously read the body of the HTTP message, that forms the POST data, just like you've just now finished reading its header, and then dump it into the pipe.
That means parsing the Content-Length header, to know exactly how many bytes are in the body, and reading exactly that (because you're not going to get an EOF, and rely on it, since the socket will remain open for your expected HTTP reply).
And, of course, you will also then need to carefully handle everything that happens as a result of writing to a pipe, i.e. gracefully handling the reader shutting down before consuming the entirety of the piped data, resulting in a broken pipe, etc...

Webbased chat in php without using database or file

I am trying to implement a realtime chat application using PHP . Is it possible to do it without using a persistent data storage like database or file . Basically what I need is a mediator written in PHP who
accepts messages from client browsers
Broadcasts the message to other clients
Forgets the message

You should check out Web Sockets of html5. It uses two way connection so you will not need any database or file. Any chat message comes to the server will directly sent to the other users browser without any Ajax call. But you need also to setup web socket server.
Web sockets are used in many real time applications as well. I am shortly planing to write full tutorial on that. I will notify you.

Just tried something I had never done before in response to this question. Seemed to work but I only tested it once. Instead of using a Socket I had an idea of using a shared Session variable. Basically I forced the Session_id to be the same value regardless of the user therefore they are all sharing the same data. From a quick test it seems to work. Here is what I did:
session_id('12345');
session_start();
$session_id = session_id();
$_SESSION['test'] = $_SESSION['test'] + 1;
echo "session: {$session_id} test: {$_SESSION['test']} <br />";
So my thought process was that you could simply store the chat info in a Session variable and force everyone regardless of who they are to use a shared session. Then you can simply use ajax to continually reload the current Session variable, and use ajax to edit the session variable when adding a message. Also you would probably want to set the Session to never expire or have a really long maxlifetime.
As I said I just played around with this for a few minutes to see if it would work.

You will want to use Sockets. This article will cover exactly what you want to do: http://devzone.zend.com/209/writing-socket-servers-in-php/

When I tried to solve the same problem, I went with Nginx's Push Module. I chose to go this way since I had to support older browsers (that usually won't support WebSockets) and had no confidence in setting up an appropriate solution like Socket.io behind a TCP proxy.
The workflow went like this:
The clients connect through long-polling to my /subscriber location, which is open to all.
The /publisher location only accepts connections from my own server
When a client subscribes and talks, it basically just asks a PHP script to handle whatever data is sent.
This script can do validation, authorization, and such, and then forwards (via curl) the message in a JSON format to the /publisher.
Nginx's Push Module handles sending the message back to the subscribers and the client establishes a new long-polling connection.
If I had to do this all over again, then I would definitely go the Socket.io route, as it has proper fallbacks to Comet-style long-polling and has great docs for both Client and Server scripts.
Hope this helps.

If you have a business need for PHP, then adding another language to the mix just means you then have two problems.
It is perfectly possible to run a permanent, constantly-running daemonised PHP IRCd server: I know, because I've done it, to make an online game which ran for years.
The IRC server part I used is a modified version of WaveIRCd:
http://sourceforge.net/projects/waveircd/
I daemonised it using code I made available here:
http://www.thudgame.com/node/254
That code might be overkill: I wrote it to be as rugged as I could, so it tries to daemonise using PHP's pcntl_fork(), then falls back to calling itself recursively in the background, then falls back to perl, and so on: it also handles the security restrictions of PHP's safe mode in case someone turns that on, and the security restrictions imposed by being called through cron.
You could probably strip it down to just a few lines: the bits with the comments "Daemon Rule..." - follow those rules, and you'll daemonize your process just fine.
In order to handle any unexpected daemon deaths, etc, I then ran that daemoniser every minute through cron, where it checked to see if the daemon was already running, and if so either quietly died, or if the daemon was nonresponsive, killed it and took its place.
Because of the whole distributed nature of IRC, it was nicely rugged, and gave me a multiplayer browser game with no downtime for a good few years until bit-rot ate the site a few months back. I should try to rewrite the front end in Flash and get it back up again someday, when I have time...
(I then ran another daemonizer for a PHP bot to manage the game itself, then had my game connect to it as a java applet, and talk to the bot to play the game, but that's irrelevant here).
Since WaveIRCd is no longer maintained, it's probably worth having a hunt around to find if anyone else has forked the project and is supporting it.
[2012 edit: that said, if you want your front end to be HTML5/Javascript, or if you want to connect through the same port that HTTP connects through, then your options are more limited than when using Flash or Java. In that case, take the advice of others, and use "WebSockets" (poor support in most current browsers) or the "Socket.io" project (which uses WebSockets, but falls back to Flash, or various other methods, depending what the browser has available).
The above is for situations where your host allows you to run a service on another port. In particular, many have explicit rules in their ToS against running an IRCd.]
[2019 edit: WebSockets are now widely supported, you should be fine using them. As a relevant case study, Slack is written in PHP (per https://slack.engineering/taking-php-seriously-cf7a60065329), and for some time supported the IRC protocol, though I believe that that has since been retired. As its main protocol, it uses an API based on JSON over WebSockets (https://api.slack.com/rtm). This all shows that a PHP IRCd can deliver enterprise-level performance and quality, even where the IRC protocol is translated to/from another one, which you'd expect to give poorer performance.]

You need to use some kind of storage as a buffer. It IS plausable not to use file or db (which also uses a file). You can try using php's shared memory functions, but I don't know any working solution so you'll have to do it from scratch.

Is it possible to do it without using a persistent data storage like
database or file?
It is possible but you shouldn't use. Database or file based doesn't slows down chat. It will be giving additional security to your chat application. You can make web based chat using ajax and sockets without persistent data.
You should see following posts:
Is database based chat room bad idea?
Will polling from a SQL DB instead of a file for chat application increase performance?
Using memcached as a database buffer for chat messages
persistent data in php question
https://stackoverflow.com/questions/6569754/how-can-i-develop-social-network-chat-without-using-a-database-for-storing-the-c
File vs database for storage efficiency in chat app

PHP is not a good fit for your requirements (in a normal setup like apache-php, fastcgi etc.), because the PHP script gets executed from top to bottom for every request and cannot maintain any state between the requests without the use of external services or databases/files (Except e.g. http://php.net/manual/de/book.apc.php, but it is not intended for implementing a chat and will not scale to multiple servers.)
You should definitely look at Node.js and especially the Node.js module Socket.IO (A Websocket library). It's incredibly easy to use and rocks. Socket.IO can also scale to multiple chat servers with an optional redis backend, which means it's easier to scale.
Trying to use $_SESSION with a static session id as communication channel is not a solution by the way, because PHP saves the session data into files.

One solution to achieving this is by writing a PHP socket server.
<?php
// Set time limit to indefinite execution
set_time_limit (0);
// Set the ip and port we will listen on
$address = '192.168.0.100';
$port = 9000;
$max_clients = 10;
// Array that will hold client information
$clients = Array();
// Create a TCP Stream socket
$sock = socket_create(AF_INET, SOCK_STREAM, 0);
// Bind the socket to an address/port
socket_bind($sock, $address, $port) or die('Could not bind to address');
// Start listening for connections
socket_listen($sock);
// Loop continuously
while (true) {
// Setup clients listen socket for reading
$read[0] = $sock;
for ($i = 0; $i < $max_clients; $i++)
{
if ($client[$i]['sock'] != null)
$read[$i + 1] = $client[$i]['sock'] ;
}
// Set up a blocking call to socket_select()
$ready = socket_select($read,null,null,null);
/* if a new connection is being made add it to the client array */
if (in_array($sock, $read)) {
for ($i = 0; $i < $max_clients; $i++)
{
if ($client[$i]['sock'] == null) {
$client[$i]['sock'] = socket_accept($sock);
break;
}
elseif ($i == $max_clients - 1)
print ("too many clients")
}
if (--$ready <= 0)
continue;
} // end if in_array
// If a client is trying to write - handle it now
for ($i = 0; $i < $max_clients; $i++) // for each client
{
if (in_array($client[$i]['sock'] , $read))
{
$input = socket_read($client[$i]['sock'] , 1024);
if ($input == null) {
// Zero length string meaning disconnected
unset($client[$i]);
}
$n = trim($input);
if ($input == 'exit') {
// requested disconnect
socket_close($client[$i]['sock']);
} elseif ($input) {
// strip white spaces and write back to user
$output = ereg_replace("[ \t\n\r]","",$input).chr(0);
socket_write($client[$i]['sock'],$output);
}
} else {
// Close the socket
socket_close($client[$i]['sock']);
unset($client[$i]);
}
}
} // end while
// Close the master sockets
socket_close($sock);
?>
You would execute this by running it through command line and would always have to run for your PHP clients to connect to it. You could then write a PHP client that would connect to the socket.
<?php
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
?>
You would have to use some type of ajax to call with jQuery posting the message to this PHP client.
http://devzone.zend.com/209/writing-socket-servers-in-php/
http://php.net/manual/en/function.fsockopen.php

Better use a node.js server for this. WebSockets aren't cross-browser nowadays (except socket.io for node.js that works perfect)

in short answer, you can't.
the current HTTP/HTML implementation doesn't support the pushstate so the algorithm of your chat app should follow :
A: sent message
B,C,D: do while a new message has been sent get this message.
so the receivers always have to make a new request and check if a new message has been sent. (AJAX Call or something similar )
so always there are a delay between the sent event and the receive event.
which means the data must be saved in something global, like db or file system.
take a look for :
http://today.java.net/article/2010/03/31/html5-server-push-technologies-part-1

You didn't say it had to all be written it PHP :)
Install RabbitMQ, and then use this chat implementation built on top of websockets and RabbitMQ.
Your PHP is pretty much just 'chat room chrome'. It's possible most of your site would fit within the 5 meg limit of offline HTML5 content, and you have a very flexible (and likely more robust than if you did it yourself) chat system.
It even has 20 messages of chat history if you leave the room.
https://github.com/videlalvaro/rabbitmq-chat

If You need to use just PHP, then You can store chat messages in session variables, session could be like object, storing a lot of information.
If You can use jQuery then You could just append paragraph to a div after message has been sent, but then if site is refreshed, messages will be gone.
Or combining, store messages in session and update that with jQuery and ajax.

Try looking into socket libraries like ZeroMQ they allow for instant transport of the message, and are quicker than TCP, and is realtime. Their infrastructure allows for instant data send between points A and B, without the data being stored anywhere first (although you can still choose to).
Here's a tutorial for a chat client in ZeroMQ

How to implement event listening in PHP

here is my problem: I have a script (let's call it comet.php) whic is requsted by an AJAX client script and wait for a change to happen like this:
while(no_changes){
usleep(100000);
//check for changes
}
I don't like this too much, it's not very scalable and it's (imho) "bad practice"
I would like to improve this behaviour with a semaphore(?) or anyway concurrent programming
technique. Can you please give me some tips on how to handle this? (I know, it's not a short answer, but a starting point would be enough.)
Edit: what about LibEvent?

You can solve this problem using ZeroMQ.
ZeroMQ is a library that provides supercharged sockets for plugging things (threads, processes and even separate machines) together.
I assume you're trying to push data from the server to the client. Well, a good way to do that is using the EventSource API (polyfills available).
client.js
Connects to stream.php through EventSource.
var stream = new EventSource('stream.php');
stream.addEventListener('debug', function (event) {
var data = JSON.parse(event.data);
console.log([event.type, data]);
});
stream.addEventListener('message', function (event) {
var data = JSON.parse(event.data);
console.log([event.type, data]);
});
router.php
This is a long-running process that listens for incoming messages and sends them out to anyone listening.
<?php
$context = new ZMQContext();
$pull = $context->getSocket(ZMQ::SOCKET_PULL);
$pull->bind("tcp://*:5555");
$pub = $context->getSocket(ZMQ::SOCKET_PUB);
$pub->bind("tcp://*:5556");
while (true) {
$msg = $pull->recv();
echo "publishing received message $msg\n";
$pub->send($msg);
}
stream.php
Every user connecting to the site gets his own stream.php. This script is long-running and waits for any messages from the router. Once it gets a new message, it will output this message in EventSource format.
<?php
$context = new ZMQContext();
$sock = $context->getSocket(ZMQ::SOCKET_SUB);
$sock->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE, "");
$sock->connect("tcp://127.0.0.1:5556");
set_time_limit(0);
ini_set('memory_limit', '512M');
header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
while (true) {
$msg = $sock->recv();
$event = json_decode($msg, true);
if (isset($event['type'])) {
echo "event: {$event['type']}\n";
}
$data = json_encode($event['data']);
echo "data: $data\n\n";
ob_flush();
flush();
}
To send messages to all users, just send them to the router. The router will then distribute that message to all listening streams. Here's an example:
<?php
$context = new ZMQContext();
$sock = $context->getSocket(ZMQ::SOCKET_PUSH);
$sock->connect("tcp://127.0.0.1:5555");
$msg = json_encode(array('type' => 'debug', 'data' => array('foo', 'bar', 'baz')));
$sock->send($msg);
$msg = json_encode(array('data' => array('foo', 'bar', 'baz')));
$sock->send($msg);
This should prove that you do not need node.js to do realtime programming. PHP can handle it just fine.
Apart from that, socket.io is a really nice way of doing this. And you could connect to socket.io to your PHP code via ZeroMQ easily.
See also
ZeroMQ
ZeroMQ PHP Bindings
ZeroMQ is the Answer - Ian Barber (Video)
socket.io

It really depends on what you are doing in your server side script. There are some situations in which your have no option but to do what you are doing above.
However, if you are doing something which involves a call to a function that will block until something happens, you can use this to avoid racing instead of the usleep() call (which is IMHO the part that would be considered "bad practice").
Say you were waiting for data from a file or some other kind of stream that blocks. You could do this:
while (($str = fgets($fp)) === FALSE) continue;
// Handle the event here
Really, PHP is the wrong language for doing stuff like this. But there are situations (I know because I have dealt with them myself) where PHP is the only option.

As much as I like PHP, I must say that PHP isn't the best choice for this task.
Node.js is much, much better for this kind of thing and it scales really good. It's also pretty simple to implement if you have JS knowledge.
Now, if you don't want to waste CPU cycles, you have to create a PHP script that will connect to a server of some sort on a certain port. The specified server should listen for connections on the chosen port and every X amount of time check for whatever you want to check (db entries for new posts for example) and then it dispatches the message to every connected client that the new entry is ready.
Now, it's not that difficult to implement this event queue architecture in PHP, but it'd take you literally 5 minutes to do this with Node.js and Socket.IO, without worrying whether it'll work in majority of browsers.

I agree with the consensus here that PHP isn't the best solution here. You really need to be looking at dedicated realtime technologies for the solution to this asynchronous problem of delivering data from your server to your clients. It sounds like you are trying to implement HTTP-Long Polling which isn't an easy thing to solve cross-browser. It's been tackled numerous times by developers of Comet products so I'd suggest you look at a Comet solution, or even better a WebSocket solution with fallback support for older browsers.
I'd suggest that you let PHP do the web application functionality that it's good at and choose a dedicated solution for your realtime, evented, asynchronous functionality.

You need a realtime library.
One example is Ratchet http://socketo.me/
The part that takes care of the pub sub is discussed at http://socketo.me/docs/wamp
The limitation here is that PHP also needs to be the one to initiate the mutable data.
In other words this wont magically let you subscribe to when MySQL is updated. But if you can edit the MySQL-setting code then you can add the publish part there.

How to display HTML to the browser incrementally over a long period of time?

Do I need to pass back any HTTP headers to tell the browser that my server won't be immediately closing the connection and to display as the HTML is received? Is there anything necessary to get the HTML to incrementally display like flush()?
This technique used to be used for things like chat, but I'm thinking about using it for a COMET type application.

Long polling is a common technique to do something like this; to briefly summarise, it works as follows:
The client sends an XHR to the server.
If there is data ready, the server returns this immediately.
If not, the server keeps the connection open until data does become available, then it returns this.
If the request times-out, go back to 1).
The page running on the client receives this data, and does what it does with it.
Go back to 1)
This is how Facebook implements its chat feature.
This article also clears up some of the misconceptions of long-polling, and details some of the benefits of doing so.

The client will close the connection when it does not receive any data for a certain time. This timeout cannot be influenced by HTTP headers. It is client-specific and usually set to 120 seconds IIRC.
So all you have to do is send small amounts of data regularly to avoid hitting the timeout.

I think a more robust solution is a page with a Javascript timer that polls the server for new data. Keeping the response open is not something the HTTP protocol was designed for.

I would just echo / print the HTML as I went. There are a few different ways you can have the script pause before sending the next bit. You shouldn't need to do anything with headers or any special code to tell the browser to wait. As long as your script is still running it will render the HTML it receives from the script.
echo "<HTML><HEAD.../HEAD><BODY>";
while (running)
{
echo "printing html... </br>";
}
echo "</BODY></HTML>"; //all done

Try forever frame (like in gmail)
All of these technics are just hacks, http isn't designed to do this.

at the end of your script, use something like this (assuming you had output buffering on by putting ob_start() at the top of your page
<?php
set_time_limit(0); // Stop PHP from closing script after 30 seconds
ob_start();
echo str_pad('', 1024 * 1024, 'x'); // Dummy 1 megabyte string
$buffer = ob_get_clean();
while (isset($buffer[0])) {
$send = substr($buffer, 0, 1024 * 30); // Get 30kbs bytes from buffer :D
$buffer = substr($buffer, 1024 * 30); // Shorten buffer
echo $send; // Send buffer
echo '<br />'; // forces browser to reload contents some how :P
ob_flush(); // Flush output to browser
flush();
sleep(1); // Sleep for 1 second
}
?>
That script basically outputs 1 megabyte of text at 30kbs (simulated) no matter how fast the user and server connection is.

Depending on what you are doing, you could just echo as your script proceeds, this will then send the html to the browser as it is echoed.

I would suggest you investigate implementing such functionality using Ajax, rather than plain old HTML. This allows you much more flexibility in terms of architectural design and user interface

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to keep Node.js from splitting socket messages into smaller chunks - php

You should try using a buffer, to cache the data, as Node.js tends to split data in order to improve performance. http://nodejs.org/api.html#buffers-2 you can buffer all your request, and then call the function with the data stored at it.

Related

Detect first connect of persistent PHP stream (STREAM_CLIENT_PERSISTENT)

Http server post data to php-cgi using C++

Webbased chat in php without using database or file

How to implement event listening in PHP

How to display HTML to the browser incrementally over a long period of time?

Categories

Resources