Hi I'm looking for a way to implement a coroutine in a php file. The idea is that I have long processes that need to be able to yield for potentially hours or days. So other php files will be calling functions in the same file as the coroutine to update something, then call a function like $coroutine.process() that causes the coroutine to continue from its last yield. This is to avoid having to use a large state machine.
I'm thinking that the coroutine php file will not actually be running when it's idle, but that when given processing time, will enter from the top and use something like a switch or goto to restart from the previous yield. Then when it reaches the next yield, the file will save its current state somewhere (like a session or database) and then exit.
Has anyone heard of this, or a metaphor similar to this? Bonus points for aggregating and managing multiple coroutines under one collection somehow, perhaps with support for a thread-like join so that flow continues in one place when they finish (a bit like Go).
UPDATE: php 5.5.0 has added support for generators and coroutines:
https://github.com/php/php-src/blob/php-5.5.0/NEWS
https://wiki.php.net/rfc/generators
I have not tried it yet, so perhaps someone can suggest a barebones example. I'm trying to convert a state machine into a coroutine. So for example a switch command inside of a for loop (whose flow is difficult to follow, and error prone as more states are added) converted to a cooperative thread where each decision point is easy to see in an orderly, linear flow that pauses for state changes at the yield keyword.
A concrete example of this is, imagine you are writing an elevator controller. Instead of determining whether to read the state of the buttons based on the elevator's state (STATE_RISING, STATE_LOWERING, STATE_WAITING, etc), you write one loop with sub-loops that run while the elevator is in each state. So while it's rising, it won't lower, and it won't read any buttons except the emergency button. This may not seem like a big deal, but in a complex state machine like a chat server, it can become almost impossible to update a state machine without introducing subtle bugs. Whereas the cooperative thread (coroutine) version has a plainly visibly flow that's easier to debug.
The Swoole Coroutine library provides go like coroutines for PHP. Each coroutine adds only 8K of ram per process. It provides a coroutine API with the basic functions expected (such as yield and resume), coro utilities such a coroutine iterator, as well as higher level coroutine builtins such as filesystem functions and networking (socket clients and servers, redis client and server, MySQL client, etc).
A second element to your question is the ability to have long lived coroutines - this likely isn't a good idea unless you are saving the state of the coro in a session and allowing the coro to end/close. Otherwise the request will have to live as long as the coroutine. If the service is being hosted by a long lived PHP script the scenario is easier and the coroutine will simply live until it is allowed to / forced to close.
Swoole performs comparibly to Node.js and Go based services, and is used in multiple production services that regularly host 500K+ TCP connections. It is a little known gem for PHP, largely because it is developed in China and most support and documentation is limited to Chinese speakers, although a small handful of individuals strive to help individuals that speak other languages.
One nice point for Swoole is that it's PHP classes wrap an expansive C/C++ api designed from to start to allow all of it's features to be used without PHP. The same source can easily be compiled as both a PHP extension and/or a standard library for both *NIX systems and Windows.
PHP does not support coroutines.
I would write a PHP extension with setcontext(), of course assuming you are targeting Unix platforms.
Here a StackOverflow question about getting started with PHP extensions: Getting Started with PHP Extension-Development.
Why setcontext()? It is a little known fact that setcontext() can be used for coroutines. Just swap the context when calling another coroutine.
I am writing a second answer because there seems to be a different approach to PHP coroutines.
With Comet HTTP responses are long-lived. Small <script> chunks are sent from time to time and the JavaScript is executed by the browser as they arrive. The response can pause for a long time waiting for an event. 2001 I wrote a small hobby chat server in Java exploiting this technique. I was abroad for half a year and was homesick and used this to chat with my parents and my friends at home.
The chat server showed me that it is possible that a HTTP request triggers other HTTP responses. This is somewhat like a coroutine. All the HTTP responses are waiting for an event and if the event applies for a response, it takes up processing and then goes sleeping again, after having triggered some other response.
You need a medium over which the PHP "processes" communicate with each other. A simple medium are files, but I think a database would be a better fit. My old chat server used a log file. Chat messages were appended to the log file and all chat processes were continually reading from the end of the log file in an endless loop. PHP supports sockets for direct communication, but this needs a different setup.
To get started, I propose these two functions:
function get_message() {
# Check medium. Return a message; or NULL if there are no messages waiting.
}
function send_message($message) {
# Write a message to the medium.
}
Your coroutines loop like this:
while (1) {
sleep(1); // go easy on the CPU
$message = get_message();
if ($message === NULL) continue;
# Your coroutine is now active. Act on the message.
# You can send send messages to other coroutines.
# You also can send <script> chunks to the browser, like this:
echo '<script type="text/javascript">';
echo '// Your JavaScript code';
echo '</script>';
flush();
# Yield
}
To yield use continue, because it restarts the while (1) loop waiting for messages. The coroutine also yields at the end of the loop.
You can give your coroutines IDs and/or devise a subscription model in which some coroutines listen to some messages but not all.
Edit:
Sadly PHP and Apache are not a very good fit for a scalable solution. Even if most of the time the coroutines don't do anything, they hog memory as processes, and Apache starts trashing memory if there are too many of them, maybe for a few thousand coroutines. Java is not very much better, but since my chat server was private, I didn't experience performance problems. There never were more than 10 users accessing it simultaneously.
Ningx, Node.js or Erlang have this solved in a better way.
Related
I have a php script on my website that is designed to give a nice overview of a domain name the user enters. It does this job quite well, however it is very slow. This might have something to do with the fact it's checking an array of 64 possible domain names, and THEN moving on to checking nameservers for A records/MX records/NS records etc.
What i would like to know, is it possible to run multiple threads/child processes of this? So that it will check multiple ellements of the array at once, and generate the output a lost faster?
I've put an example of my code in a pastebin (so to avoid creating a huge and spammy post on here)
http://pastebin.com/Qq9qKtP9
In perl I can do something like this:
$fork = new Parallel::ForkManager($threads);
foreach(Something here){
$fork->start and next;
$fork->finish;
}
And i could make the loop run in as many processes as needed. Is something similar possible in PHP or any other ways you can think of to speed this up? The main issue is, cloudflare has a timeout, and often it will take long enough CF blocks the response happening.
Thanks
* Never Mind Support !! *
You never want to create threads (or additional processes for that matter) in direct response to a web request.
If your frontend is instructed to create 60 threads every time someone clicks on page.php, and 100 people come along and request page.php at once, you will be asking your hardware to create and execute 6000 threads concurrently, to say nothing of the threads used by operating system services and other software. For obvious reasons, this does not, and will never scale.
Rather you want to separate out those parts of the application that require additional threads or processes and communicate with this part of the application via some kind of sane RPC. This means that the backend of the application can utilize concurrency via pthreads or forking, using a fixed number of threads or processes, and spreading work as evenly as possible across all available resources. This allows for an influx of traffic; it allows your application to scale.
I won't write example code, it seems altogether too trivial.
The first thing you want to do is optimze your code to shorten the execution time as much as possible.
For example, instead of making five dns queries:
$NS = dns_get_record($murl, DNS_NS);
$MX = dns_get_record($murl,DNS_MX);
$SRV = dns_get_record($murl,DNS_SRV);
$A = dns_get_record($murl,DNS_A);
$TXT = dns_get_record($murl,DNS_TXT);
You can only call dns_get_record once:
$DATA = dns_get_record($murl, DNS_NS + DNS_MX + DNS_SRV + DNS_A + DNS_TXT);
and parse out the variables from there.
Instead of outright forking processes to handle several parts concurrently, I'd implement a queue that all of the requests would get pushed into. The query processor would be limited as to how many items it could process at once, avoiding the potential DoS if hundreds or thousands of requests hit your site at the same time. Without some sort of limiting mechanism, you'd end up with so many processes that the server might hang.
As for the processor, in addition to the previously mentioned items, you could try pecl/Gearman as your queue processor. I haven't used it, but it appears to do what you're looking for.
Another method to optimize this would be implementing a caching system, that saved the results for, say, a week (or whatever). This would cut down on someone looking up the same site repeatedly in a day (or running a script on your site).
I doubt that it's a good idea to fork with PHP the apache process. But if you really want there is PCNTL (which is not available in the apache module).
You might have more fun with pthread. Nowadays you can even download a PHP which claims to be threadsafe.
And finally you have the possibility to use classic non blocking IO which I would prefer in the case of PHP.
I have a php based web application that captures certain events in a database table. It also features a visualization of those captured events: a html table listing the events which is controlled by ajax.
I would like to add an optional 'live' feature: after pressing a button ('switch on') all events captured from that moment on will be inserted into the already visible table. Three things have to happen: noticing the event, fetching the events data and inserting it into the table. To keep the server load inside sane limits I do not want to poll for new events with ajax request, instead I would prefer the long polling strategy.
The problem with this is obviously that when doing a long polling ajax call the servers counterpart has to monitor for an event. Since the events are registered by php scripts there is no easy way to notice that event without polling the database for changes again. This is because the capturing action runs in another process than the observing long polling request. I looked around to find a usable mechanism for such inter process communication as I know it from rich clients under linux. Indeed there are php extensions for semaphores, shared memory or even posix. However they all only exist under linux (or unix like) systems. Though not typically the application might be used under MS-Windows systems in rare cases.
So my simple question is: is there any means that is typically available on all (most) systems that can push such events to a php script servicing the long polling ajax request ? Something without polling a file or a database constantly, since I already have an event elsewhere ?
So, the initial caveats: without doing something "special", trying to do long polling with vanilla PHP will eat up resources until you kill your server.
Here is a good basic guide to basic PHP based long polling and some of the challenges associated with going the "simple" road:
How do I implement basic "Long Polling"?
As far as doing this really cross-platform (and simple enough to start), you may need to fall back to some sort of simple internal polling - but the goal should be to ensure that this action is much lower-cost than having the client poll.
One route would be to essentially treat it like you're caching database calls (which you are at this point), and go with some standard caching approaches. Everything from APC, to memcached, to polling a file, will all likely put less load on the server than having the client set up and tear down a connection every second. Have one process place data in the correct keys, and then poll them in your script on a regular basis.
Here is a pretty good overview of a variety of caching options that might be crossplatform enough for you:
http://simas.posterous.com/php-data-caching-techniques
Once you reach the limits of this approach, you'll probably have to move onto a different server architecture anyhow.
I'm trying to make a theoretical web chat application with php and jquery, I've read about long polling and http streaming, and I managed to apply most principles introduced in the articles. However, there are 2 main things I still can't get my head around.
With Long Polling
How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?
With HTTP Streaming
How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?
I'll appreciate any and all answers, thanks in advance.
Yeah, the Comet-like techniques usually blowing up the brain in the beginning -- just making you think in a different way. And another problem is there are not that much resources available for PHP, cuz everyone's doing their Comet in node.js, Python, Java, etc.
I'll try to answer your questions, hope it would shed some light on this topic for people.
How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?
The answer is: in the most general case you should use a message queue (MQ). RabbitMQ or the Pub/Sub functionality built into the Redis store may be a good choices, though there are many competing solutions on the market available such as ZeroMQ, Beanstalkd, etc.
So instead of continuous querying your database, you can just subscribe for an MQ-event and just hang until someone else will publish a message you subscribed for and MQ will wake you up and send a message. The chat app is a very good use case to understand this functionality.
Also I have to mention that if you would search for Comet-chat implementations in other languages, you might notice simple ones not using MQ. So how do they exchange the information then? The thing is such solutions are usually implemented as standalone single-threaded asynchronous servers, so they can store all connections in a thread local array (or something similar), handle many connections in a single loop and just pick a one and notify when needed. Such asynchronous server implementations are a modern approach that fits Comet-technique really great. However you're most likely implementing your Comet on top of mod_php or FastCGI, in this case this simple approach is not an option for you and you should use MQ.
This could still be very useful to understand how to implement a standalone asynchronous Comet-server to handle many connections in a single thread. Recent versions of PHP support Libevent and Socket Streams, so it is possible to implement such kind of server in PHP as well. There's also an example available in PHP documentation.
How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?
If you're doing your long-running polls with a usual Ajax technique such as plain XHR, jQuery Ajax, etc. you don't have an easy way to transmit several responses in a single Ajax request. As you mentioned you only have 'success' handler to deal with the response in whole and not with its part. As a workaround people send only a single response per request and process it in a 'success' handler, after that they just open a new long-poll request. This is just how HTTP-protocol works.
Also should be mentioned that actually there are workaround to implement streaming-like functionality using various techniques using techniques such as infinitely long page in a hidden IFRAME or using multipart HTTP-responses. Both of those methods are certain drawbacks (the former one is considered unreliable and sometimes could produce unwanted browser behavior such as infinite loading indicator and the latter one leaks consistent and straightforward cross-browser support, however certain applications still are known to successfully rely on that mechanism falling back to long-polling when the browser can't properly handle multipart responses).
If you'd like to handle multiple responses per single request/connection in a reliable way you should consider using a more advanced technology such as WebSocket which is supported by the most current browsers or on any platform that supports raw sockets (such as Flash or if you develop for a mobile app for instance).
Could you please elaborate more on message queues?
Message Queue is a term that describes a standalone (or built-in) implementation of the Observer pattern (also known as 'Publish/Subscribe' or simply PubSub). If you develop a big application, having one is very useful -- it allows you to decouple different parts of your system, implement event-driven asynchronous design and make your life much easier, especially in a heterogeneous systems. It has many applications to the real-world systems, I'll mention just couple of them:
Task queues. Let's say we're writing our own YouTube and need to convert users' video files in the background. We should obviously have a webapp with the UI to upload a movie and some fixed number of worker processes to convert the video files (maybe we would even need a number of dedicated servers where our workers only will leave). Also we would probably have to write our workers in C to ensure better performance. All we have to do is just setup a message queue server to collect and deliver video-conversion tasks from the webapp to our workers. When the worker spawns it connects to the MQ and goes idle waiting for a new tasks. When someone uploads a video file the webapp connects to the MQ and publishes a message with a new job. Powerful MQs such as RabbitMQ can equally distribute tasks among number of workers connected, keep track of what tasks had been completed, ensure nothing will get lost and will provide fail-over and even admin UI to browse current tasks pending and stats.
Asynchronous behavior. Our Comet-chat is a good example. Obviously we don't want to periodically poll our database all time (what's the use of Comet then? -- Not big difference of doing periodical Ajax-requests). We would rather need someone to notify us when a new chat-message appears. And a message queue is that someone. Let's say we're using Redis key/value store -- this is a really great tool that provides PubSub implementation among its data store features. The simplest scenario may look like following:
After someone enters the chat room a new Ajax long poll request is being made.
Request handler on the server side issues the command to Redis to subscribe a 'newmessage' channel.
Once someone enters a message into his chat the server-side handler publishes a message into the Redis' 'newmessage' topic.
Once a message is published, Redis will immediately notify all those pending handlers which subscribed to that channel before.
Upon notification PHP-code that keeps long-poll request open, can return the request with a new chat message, so all users will be notified. They can read new messages from the database at that moment, or the messages may be transmitted directly inside message payload.
I hope my illustration is easy to understand, however message queues is a very broad topic, so refer to the resources mentioned above for further reading.
How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?
Actually, you can. I've provided a revised answer for the above but I don't know if it's still pending or has been ignored. Providing an update here so that the correct information is available.
If you keep the connection between the client and the server open it is possible to push updates through which are appended to the response. As each update comes in the XMLHttpRequest.onreadystatechange event is fired and the value of the XMLHttpRequest.readyState will be 3. This means that the XMLHttpRequest.responseText continues to grow.
You can see an example of this here:
http://www.leggetter.co.uk/stackoverflow/7213549/
To see the JS code simply view source. The PHP code is:
<?php
$updates = $_GET['updates'];
if(!$updates) {
$updates = 100;
}
header('Content-type: text/plain');
echo str_pad('PADDING', 2048, '|PADDING'); // initial buffer required
$sleep_time = 1;
$count = 0;
$update_suffix = 'Just keep streaming, streaming, streaming. Just keep streaming.';
while($count < 100) {
$message = $count . ' >> ' . $update_suffix;
echo($message);
flush();
$count = $count + 1;
sleep($sleep_time);
}
?>
In Gecko based browsers such as Firefox it's possible to completely replaces the responseText by using multipart/x-mixed-replace. I've not provided an example of this.
It doesn't look like it's possible to achieve the same sort of functionality using jQuery.ajax. The success callback does not fire whenever the onreadystatechange event is fired. This is surprising since the documentation states:
No onreadystatechange mechanism is provided, however, since success, error, complete and statusCode cover all conceivable requirements.
So the documentation is potentially wrong unless I'm misinterpreting it?
You can see an example that tries to use jQuery here:
http://www.leggetter.co.uk/stackoverflow/7213549/jquery.html
If you take a look at the network tab in either Firebug or Chrome Developer tools you'll see the file size of stream.php growing but the success callback still isn't fire.
I'm making a web browser based multiplayer game. I've determined that websockets are the best way to handle communications given its realtime nature. The client uses a HTML5 canvas to render the game and websockets to communicate with the host.
I've elected to use PHP for hosting the game as it seems to be preferred by hosting providers. I haven't used PHP before but have done similar things with websockets in Java, but relying heavily on multithreading.
I've been looking at a few tutorials on php sockets with multiple clients; but most of them do things like fork off new processes for each client. Since I'll have a constantly running game loop I don't think this is suitable.
What I'm trying to achieve is a means of assigning ports to each client as they connect, listening for new clients, exchanging data with the current list of clients and running the game loop all together.
The places where I need help are:
How to find and assign ports to new clients, notify the client of that port, and clean it up when they disconnect.
How to do the above, and all other socket transactions, without blocking the game loop. It would be acceptable to accept messages from clients in partial chunks and only act upon a complete message.
Can anyone give me some technical advice on how to achieve these goals?
I don't think this all looks like too much to ask of PHP but correct me if I'm wrong!
Some pseudocode of what I'd ideally like to achieve server-side. None of the functions should block:
Array clients;
while(gamerunning)
{
CheckForNewClients();
GetStatusFromClients();
DoGameUpdate();
SendGameStateToClients();
}
[Update]
For anyone interested, I created a dedicated application supporting web sockets (specifically using Java, and 'TooTallNates' web socket library) rather than an actual web service as it seemed to make more sense, though incidentally it seems most web browsers have since slung web sockets in the bin due to security issues.
You really need to run a PHP daemon in order to do this effectively (and it NEEDS to be PHP 5.3). I wrote a fairly completely overview of using PHP for daemon processes. Whatever you chose, I would suggest you use an event based, run loop system.
I've designed a basic RunLoop library called LooPHP which could probably be helpful, especially if your going to be dealing with *_select. I'd be more than happy to answer any question you have about that.
EDIT:
In an event based system you don't simply while a list of commands, you react to a listener. For example...
Instead of doing:
while( 1 ) {
... /* listen, react */
} /* repeat */
Run loops work by registering listener (sockets, and other async event generators)
class ReactClass { ... }
$loop = new LooPHP_EventLoop( new ReactClass );
//add one time event
$loop->addEvent( function() {
print "This event was called 0.5 second after being added\n";
}, 0.5 /* in seconds */ );
//this creates a repeating event, this is called right away and repeats
$add_event = function() use ( $loop, &$add_event ) {
print "This event is REPEATEDLY called 0.1 every second\n";
$loop->addEvent( $add_event, 0.1 );
};
$add_event();
//start the loop processing, no events are processed until this is done
$loop->run(); //php doesn't leave this call until the daemon is done
exit(0); //cleanly exit
The above case is a very simple 1 source EventLoop and a manually add timed functions ( these can be added even from within a call to ReactClass).
In the application I'm working I needed to have both asynchronous event feed into the backend (via a socket) and then needed to have the ability to call functions arbitrary offset from the original event (for timed-out clients, etc).
If you'd like more examples, you can find them over at github.
I hope you find this useful.
I wouldn't suggest using PHP for this type of application. PHP doesn't officially support multithreading and running a PHP script for an undefined period of time (like a server) isn't really an advertised feature.
Of course you could try and make history :)
(please correct me if i'm mistaken)
I'm planning to write a system which should accept input from users (from browser), make some calculations and show updated data to all users, currently visiting certain website.
Input can come one time in a hour, but can also come 100 times each second. It is VERY important not to loose any of user inputs, but really register and process ALL of them.
So, the idea was to create two programs. One will receive data (input) from browser and store it somehow in a queue (maybe an array, to be really fast?). Second program should wait until there are new items in the queue (saving resources) and then became active and begin to process the queue items. Both programs should run asynchronously.
I can php, so I would write first program using php. But I'm not sure about second part.. I'm not sure about how to send an event from first to second program. I need some advice at this point. Threads are not possible with php? I need some ideas how to create the system like i discribed.
I would use comet server to communicate feedback to the website the input came from (this part already tested)
As per the comments above, trivially you appear to be describing a message queueing / processing system, however looking at your question in more depth this is probably not the case:
Both programs should run asynchronously.
Having a program which process a request from a browser but does it asynchronously is an oxymoron. While you could handle the enqueueing of a message after dealing with the HTTP request, its still a synchronous process.
It is VERY important not to loose any of user inputs
PHP is not a good language for writing control systems for nuclear reactors (nor, according to Microsoft, is Java). HTTP and TCP/IP are not ideal for real time systems either.
100 times each second
Sorry - I thought you meant there could be a lot of concurrent requests. This is not a huge amount.
You seem to be confusing the objective of using COMET / Ajax with asynchronous processing of the application. Even with very large amounts of data, it should be possible to handle the interaction using a single php script working synchronously.