Async PHP | Processing data into several systems (Advice) - php

I'm building an integration that communicates data to several different systems via API (REST). I need to process data as quickly as possible. This is a basic layout:
Parse and process data (probably into an array as below)
$data = array( Title => "Title", Subtitle => "Test", .....
Submit data into service (1) $result1 = $class1->functionservice1($data);
Submit data into service (2) $result2 = $class2->functionservice2($data);
Submit data into service (3) $result3 = $class3->functionservice3($data);
Report completion echo "done";
Run in a script as above I'll need to wait for each function to finish before it starts the next one (taking 3 times longer).
Is there an easy way to run each service function asynchronously but wait for all to complete before (5) reporting completion. I need to be able to extract data from each $result and return that as one post to a 4th service.
Sorry if this is an easy question - I'm a PHP novice
Many thanks, Ben

Yes, there are multiple ways.
The most efficient is to use an event loop that leverages non-blocking I/O to achieve concurrency and cooperative multitasking.
One such event loop implementation is Amp. There's an HTTP client that works with Amp, it's called Artax. An example is included in its README. You should have a look at how promises and coroutines work. There's Amp\wait to mix synchronous code with async code.
<?php
Amp\run(function() {
$client = new Amp\Artax\Client;
// Dispatch two requests at the same time
$promises = $client->requestMulti([
'http://www.google.com',
'http://www.bing.com',
]);
try {
// Yield control until all requests finish
list($google, $bing) = (yield Amp\all($promises));
var_dump($google->getStatus(), $bing->getStatus());
} catch (Exception $e) {
echo $e;
}
});
Other ways include using threads and or processes to achieve concurrency. Using multiple processes is the easiest way if you want to use your current code. However, spawning processes isn't cheap and using threads in PHP isn't really a good thing to do.

You can also put your code in another php file and call it using this :
exec("nohup /usr/bin/php -f your script > /dev/null 2>&1 &");

If you want to use asynchronicity like you can do in other languages ie. using threads, you will need to install the pthreads extension from PECL, because PHP does not support threading out of the box.
You can find an explaination on how to use threads with this question :
How can one use multi threading in PHP applications

Related

Asynchronous API calls using PHP

I have a PHP application that fetches prices from various sources. On every execution, it's loading from over 20 API endpoints using Guzzle. Due to the load time of each call, it's taking 10-30 sec per execution cycle. If I can make all the calls in parallel, I can cut it down to less than 5 seconds.
What's the easiest way to make parallel API calls in PHP?
You're probably doing the wrong thing. These requests should probably be made at a regular interval in the background and the data be cached.
What you're trying to do is possible by leveraging non-blocking I/O. Curl offers this with curl_multi, which is available in Guzzle. But there are also other libraries implementing HTTP clients based on non-blocking I/O without a dependency on ext-curl, such as Artax.
Artax is based on Amp, which provides the basic primitives like the event loop and promises. You can start multiple requests and then wait for the set of promises.
$client = new Amp\Artax\DefaultClient;
$promises = [];
foreach ($urls as $url) {
$promises[$url] = Amp\call(function () use ($client, $url) {
// "yield" inside a coroutine awaits the resolution of the promise
// returned from Client::request(). The generator is then continued.
$response = yield $client->request($url);
// Same for the body here. Yielding an Amp\ByteStream\Message
// buffers the entire message.
$body = yield $response->getBody();
return $body;
});
}
$responses = Amp\Promise\wait(Amp\Promise\all($promises));
Try to convert the API calls to be done using commands, for example:
If you have 20 calls,
exec('php apiCall_1.php >> "./log.txt" &')
exec('php apiCall_2.php >> "./log.txt" &')
.
.
.
.
exec('php apiCall_20.php >> "./log.txt" &')
Notes:
1- All of those commands will be fired in async (do not forget to add the '&' at the end of each command to fire it, so it will not wait for the response).
2- Each command normally will store the result(which is fetched from the server) in DB collection/table.
3- Along with that, you have to write a method that keeps looking for the result if it's inserted to DB or not, so you can take it and send it back to your api consumer.

Yii Php Executing asynchronous background request

Hi i'm trying to execute a LONG RUNNING request (action) in background.
function actionRequest($id){
//execute very long process here in background but continue redirect
Yii::app()->user->setFlash('success', "Currently processing your request you may check it from time to time.");
$this->redirect(array('index', 'id'=>$id));
}
What i'm trying to achieve is to NOT have the user waiting for the request to be processed since it generally takes 5-10min, and the request usually goes to a timeout, and even if I set the timeout longer, waiting for 5-10 min. isn't a good user experience.
So I want to return to the page immediately notifying the user that his/her request is being processed, while he can still browse, and do other stuff in the application, he/she can then go back to the page and see that his/her request was processed.
I've looked into Yii extensions backjob, It works, the redirect is executed immediately (somehow a background request), but when doing other things, like navigating in the site, it doesn't load, and it seems that the request is still there, and i cannot continue using the application until the request is finished.
A similar extension runactions promises the same thing, but I could not even get it to work, it says it 'touches a url', like a fire and forget job but doesn't work.
I've also tried to look into message queuing services like Gearman, RabbitMQ, but is really highly technical, I couldn't even install Gearman in my windows machine so "farming" services won't work for me. Some answers to background processing includes CRON and AJAX but that doesn't sound too good, plus a lot of issues.
Is there any other workaround to having asynchronous background processing? I've really sought hard for this, and i'm really not looking for advanced/sophisticated solutions like "farming out work to several machines" and the likes. Thank You very much!
If you want to be able to run asynchronous jobs via Yii, you may not have a choice but to dabble with some AJAX in order to retrieve the status of the job asynchronously. Here are high-level guidelines that worked for me. Hopefully this will assist you in some way!
Setting up a console action
To run background jobs, you will need to use Yii's console component. Under /protected/commands, create a copy of your web controller that has your actionRequest() (e.g. /protected/commands/BulkCommand.php).
This should allow you to go in your /protected folder and run yiic bulk request.
Keep in mind that if you have not created a console application before, you will need to set up its configuration similar to how you've done it for the web application. A straight copy of /protected/config/main.php into /protected/config/console.php should do 90% of the job.
Customizing an extension for running asynchronous console jobs
What has worked for me is using a combination of two extensions: CConsole and TConsoleRunner. TConsoleRunner uses popen to run shell scripts, which worked for me on Windows and Ubuntu. I simply merged its run() code into CConsole as follows:
public function popen($shell, $redirectOutput = '')
{
$shell = $this->resolveCommandLine($shell, false, $redirectOutput);
$ret = self::RETURN_CODE_SUCCESS;
if (!$this->displayCommands) {
ob_start();
}
if ($this->isWindows()) {
pclose(popen('start /b '.$shell, 'r'));
}
else {
pclose(popen($shell.' > /dev/null &', 'r'));
}
if (!$this->displayCommands) {
ob_end_clean();
}
return $ret;
}
protected function isWindows()
{
if(PHP_OS == 'WINNT' || PHP_OS == 'WIN32')
return true;
else
return false;
}
Afterwards, I changed CConsole's runCommand() to the following:
public function runCommand($command, $args, $async = false, &$outputLines = null, $executor = 'popen')
{
...
switch ($executor) {
...
case 'popen':
return $this->popen($shell);
...
}
}
Running the asynchronous job
With the above set up, you can now use the following snippet of code to call yiic bulk request we created earlier.
$console = new CConsole();
$console->runCommand('bulk request', array(
'--arg1="argument"',
'--arg2="argument"',
'--arg3="argument"',
));
You would insert this in your original actionRequest().
Checking up on the status
Unfortunately, I'm not sure what kind of work your bulk request is doing. For myself, I was gathering a whole bunch of files and putting them in a folder. I knew going in how many files I expected, so I could easily create a controller action that verifies how many files have been created so far and give a % of the status as a simple division.

Run a script which is waiting for signal and is polling a ressource

I'm trying a create a script (in preference in PHP but Python should be fine too) which have the following behaviour :
We register a call back function which should start as soon as we receive a signal with an argument. Then, we create an infinite loop (this script should never stop!) to poll a webservice with a session (we got a logout very 15 minutes and we didn't want to be disconnected!).
Here is a the behaviour in pseudo-code :
function CALLBACK($arguments)
{
CURL(URL, {ARGUMENTS : $arguments});
}
add_handler(SIGNAL, ARGUMENTS, CALLBACK);
$last_poll = time();
while(true)
{
if (time() - $last_poll > 600)
{
CURL(URL_TO_POLL);
$last_poll = time();
}
sleep(1);
}
How can I do that ?
maybe its help you.
Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates. A few strong points about Gearman:
more on offical site http://gearman.org/

Downloading pages in parallel using PHP

I have to scrap a web site where i need to fetch multiple URLs and then process them one by one. The current process somewhat goes like this.
I fetch a base URL and get all secondary URLs from this page, then for each secondary url I fetch that URL, process found page, download some photos (which takes quite a long time) and store this data to database, then fetch next URL and repeat the process.
In this process, I think I am wasting some time in fetching secondary URL at the start of each iteration. So I am trying to fetch next URLs in parallel while processing first iteration.
The solution in my mind is, from main process call a PHP script, say downloader, which will download all the URL (with curl_multi or wget) and store them in some database.
My questions are
How to call such downloder asynchronously, I don't want my main script to wait till downloder completes.
Any location to store downloaded data, such as shared memory. Of course, other than database.
There any chances that data gets corrupt while storing and retrieving, how to avoid this?
Also, please guide me know if anyone have a better plan.
When I hear someone uses curl_multi_exec it usually turns out they just load it with, say, 100 urls, then wait when all complete, and then process them all, and then start over with the next 100 urls... Blame me, I was doing so too, but then I found out that it is possible to remove/add handles to curl_multi while something is still in progress, And it really saves a lot of time, especially if you reuse already open connections. I wrote a small library to handle queue of requests with callbacks; I'm not posting full version here of course ("small" is still quite a bit of code), but here's a simplified version of the main thing to give you the general idea:
public function launch() {
$channels = $freeChannels = array_fill(0, $this->maxConnections, NULL);
$activeJobs = array();
$running = 0;
do {
// pick jobs for free channels:
while ( !(empty($freeChannels) || empty($this->jobQueue)) ) {
// take free channel, (re)init curl handle and let
// queued object set options
$chId = key($freeChannels);
if (empty($channels[$chId])) {
$channels[$chId] = curl_init();
}
$job = array_pop($this->jobQueue);
$job->init($channels[$chId]);
curl_multi_add_handle($this->master, $channels[$chId]);
$activeJobs[$chId] = $job;
unset($freeChannels[$chId]);
}
$pending = count($activeJobs);
// launch them:
if ($pending > 0) {
while(($mrc = curl_multi_exec($this->master, $running)) == CURLM_CALL_MULTI_PERFORM);
// poke it while it wants
curl_multi_select($this->master);
// wait for some activity, don't eat CPU
while ($running < $pending && ($info = curl_multi_info_read($this->master))) {
// some connection(s) finished, locate that job and run response handler:
$pending--;
$chId = array_search($info['handle'], $channels);
$content = curl_multi_getcontent($channels[$chId]);
curl_multi_remove_handle($this->master, $channels[$chId]);
$freeChannels[$chId] = NULL;
// free up this channel
if ( !array_key_exists($chId, $activeJobs) ) {
// impossible, but...
continue;
}
$activeJobs[$chId]->onComplete($content);
unset($activeJobs[$chId]);
}
}
} while ( ($running > 0 && $mrc == CURLM_OK) || !empty($this->jobQueue) );
}
In my version $jobs are actually of separate class, not instances of controllers or models. They just handle setting cURL options, parsing response and call a given callback onComplete.
With this structure new requests will start as soon as something out of the pool finishes.
Of course it doesn't really save you if not just retrieving takes time but processing as well... And it isn't a true parallel handling. But I still hope it helps. :)
P.S. did a trick for me. :) Once 8-hour job now completes in 3-4 mintues using a pool of 50 connections. Can't describe that feeling. :) I didn't really expect it to work as planned, because with PHP it rarely works exactly as supposed... That was like "ok, hope it finishes in at least an hour... Wha... Wait... Already?! 8-O"
You can use curl_multi: http://www.somacon.com/p537.php
You may also want to consider doing this client side and using Javascript.
Another solution is to write a hunter/gatherer that you submit an array of URLs to, then it does the parallel work and returns a JSON array after it's completed.
Put another way: if you had 100 URLs you could POST that array (probably as JSON as well) to mysite.tld/huntergatherer - it does whatever it wants in whatever language you want and just returns JSON.
Aside from the curl multi solution, another one is just having a batch of gearman workers. If you go this route, I've found supervisord a nice way to start a load of deamon workers.
Things you should look at in addition to CURL multi:
Non-blocking streams (example: PHP-MIO)
ZeroMQ for spawning off many workers that do requests asynchronously
While node.js, ruby EventMachine or similar tools are quite great for doing this stuff, the things I mentioned make it fairly easy in PHP too.
Try execute from PHP, python-pycurl scripts. Easier, faster than PHP curl.

How to implement event listening in PHP

here is my problem: I have a script (let's call it comet.php) whic is requsted by an AJAX client script and wait for a change to happen like this:
while(no_changes){
usleep(100000);
//check for changes
}
I don't like this too much, it's not very scalable and it's (imho) "bad practice"
I would like to improve this behaviour with a semaphore(?) or anyway concurrent programming
technique. Can you please give me some tips on how to handle this? (I know, it's not a short answer, but a starting point would be enough.)
Edit: what about LibEvent?
You can solve this problem using ZeroMQ.
ZeroMQ is a library that provides supercharged sockets for plugging things (threads, processes and even separate machines) together.
I assume you're trying to push data from the server to the client. Well, a good way to do that is using the EventSource API (polyfills available).
client.js
Connects to stream.php through EventSource.
var stream = new EventSource('stream.php');
stream.addEventListener('debug', function (event) {
var data = JSON.parse(event.data);
console.log([event.type, data]);
});
stream.addEventListener('message', function (event) {
var data = JSON.parse(event.data);
console.log([event.type, data]);
});
router.php
This is a long-running process that listens for incoming messages and sends them out to anyone listening.
<?php
$context = new ZMQContext();
$pull = $context->getSocket(ZMQ::SOCKET_PULL);
$pull->bind("tcp://*:5555");
$pub = $context->getSocket(ZMQ::SOCKET_PUB);
$pub->bind("tcp://*:5556");
while (true) {
$msg = $pull->recv();
echo "publishing received message $msg\n";
$pub->send($msg);
}
stream.php
Every user connecting to the site gets his own stream.php. This script is long-running and waits for any messages from the router. Once it gets a new message, it will output this message in EventSource format.
<?php
$context = new ZMQContext();
$sock = $context->getSocket(ZMQ::SOCKET_SUB);
$sock->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE, "");
$sock->connect("tcp://127.0.0.1:5556");
set_time_limit(0);
ini_set('memory_limit', '512M');
header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
while (true) {
$msg = $sock->recv();
$event = json_decode($msg, true);
if (isset($event['type'])) {
echo "event: {$event['type']}\n";
}
$data = json_encode($event['data']);
echo "data: $data\n\n";
ob_flush();
flush();
}
To send messages to all users, just send them to the router. The router will then distribute that message to all listening streams. Here's an example:
<?php
$context = new ZMQContext();
$sock = $context->getSocket(ZMQ::SOCKET_PUSH);
$sock->connect("tcp://127.0.0.1:5555");
$msg = json_encode(array('type' => 'debug', 'data' => array('foo', 'bar', 'baz')));
$sock->send($msg);
$msg = json_encode(array('data' => array('foo', 'bar', 'baz')));
$sock->send($msg);
This should prove that you do not need node.js to do realtime programming. PHP can handle it just fine.
Apart from that, socket.io is a really nice way of doing this. And you could connect to socket.io to your PHP code via ZeroMQ easily.
See also
ZeroMQ
ZeroMQ PHP Bindings
ZeroMQ is the Answer - Ian Barber (Video)
socket.io
It really depends on what you are doing in your server side script. There are some situations in which your have no option but to do what you are doing above.
However, if you are doing something which involves a call to a function that will block until something happens, you can use this to avoid racing instead of the usleep() call (which is IMHO the part that would be considered "bad practice").
Say you were waiting for data from a file or some other kind of stream that blocks. You could do this:
while (($str = fgets($fp)) === FALSE) continue;
// Handle the event here
Really, PHP is the wrong language for doing stuff like this. But there are situations (I know because I have dealt with them myself) where PHP is the only option.
As much as I like PHP, I must say that PHP isn't the best choice for this task.
Node.js is much, much better for this kind of thing and it scales really good. It's also pretty simple to implement if you have JS knowledge.
Now, if you don't want to waste CPU cycles, you have to create a PHP script that will connect to a server of some sort on a certain port. The specified server should listen for connections on the chosen port and every X amount of time check for whatever you want to check (db entries for new posts for example) and then it dispatches the message to every connected client that the new entry is ready.
Now, it's not that difficult to implement this event queue architecture in PHP, but it'd take you literally 5 minutes to do this with Node.js and Socket.IO, without worrying whether it'll work in majority of browsers.
I agree with the consensus here that PHP isn't the best solution here. You really need to be looking at dedicated realtime technologies for the solution to this asynchronous problem of delivering data from your server to your clients. It sounds like you are trying to implement HTTP-Long Polling which isn't an easy thing to solve cross-browser. It's been tackled numerous times by developers of Comet products so I'd suggest you look at a Comet solution, or even better a WebSocket solution with fallback support for older browsers.
I'd suggest that you let PHP do the web application functionality that it's good at and choose a dedicated solution for your realtime, evented, asynchronous functionality.
You need a realtime library.
One example is Ratchet http://socketo.me/
The part that takes care of the pub sub is discussed at http://socketo.me/docs/wamp
The limitation here is that PHP also needs to be the one to initiate the mutable data.
In other words this wont magically let you subscribe to when MySQL is updated. But if you can edit the MySQL-setting code then you can add the publish part there.

Categories