I found a simple class to do a parallel requests:
class Requests {
public $handle;
public function __construct() {
$this->handle = curl_multi_init();
}
public function process($urls, $callback) {
foreach($urls as $url) {
$ch = curl_init($url);
curl_setopt_array($ch, array(CURLOPT_RETURNTRANSFER => TRUE));
curl_multi_add_handle($this->handle, $ch);
}
do {
$mrc = curl_multi_exec($this->handle, $active);
if ($state = curl_multi_info_read($this->handle)) {
$info = curl_getinfo($state['handle']);
$callback(curl_multi_getcontent($state['handle']), $info);
curl_multi_remove_handle($this->handle, $state['handle']);
}
usleep(10000); // stop wasting CPU cycles and rest for a couple ms
} while ($mrc == CURLM_CALL_MULTI_PERFORM || $active);
}
public function __destruct() {
curl_multi_close($this->handle);
}
}
This should be used in the following way:
$dataprocess = function($data,$info){
echo $data;
}
$urls = array('url1','url2','url3');
$rqs = new Requests();
$rqs->process(urls,$dataprocess);
However it looks like not all urls are fetching (I'd estimate that only about a half of urls are fetched).
I found this note to PHP's curl_multi_exec function description:
If it returns CURLM_CALL_MULTI_PERFORM you better call it again soon, as that is a signal that it still has local data to send or remote data to receive.
So I suspect that this class returns too earlier or should repeat the request in some cases. But the class is controlling both curl_multi_exec output and $active parameter, so it should work fine.
Any thoughs?
UPDATE
What I've done for the moment is that I put a cicle in the function process around all its code to execute it until all the URL's are retrieved (during debugging I see that after each iteration the number of unloaded URLs is reducing like 50-22-8-0).
But I changed the class dramatically: instead of a callback function I'm passing the array with two key names (one for URL and one for content storing). So it working for me now but I still can not figure out how to do this for callback function style.
Related
I found a function here: http://archevery.blogspot.com/2013/07/php-curl-multi-threading.html
I am using it to send an array of URLs to run and process as quickly as possible via Multi-threaded curl requests. This works great.
SOME of the urls I want to send it require they be processed in order, not at the same time, but in a sequence.
How can I achieve this?
Example:
URL-A URL-B URL-C --> All fire off at the same time
URL-D URL-E --> Must wait for URL-D to finish before URL-E is triggered.
My purpose is for a task management system that allows me to add PHP applications as "Tasks" in the database. I have a header/detail relationship with the tasks so a task with one header and one detail can be sent off multi-threaded, but a task with one header and multiple details must be sent off in the order of the detail tasks.
I can do this by calling curl requests in a loop, but I want them to also fire off the base request (the first task of a sequence) as part of the multi-threaded function. I dont want to have to wait for all sequential tasks to pile up and process in order. As in the first task of each sequence should be multi-threaded, but tasks with a sequence then need to wait for that task to complete before moving to the next.
I tried this function that I send the multiple tasks to, but it waits for each task to finish before moving on the next. I need to somehow combine the multi-threaded function from the URL above with this one.
Here is my multithreaded curl function:
function runRequests($url_array, $thread_width = 10) {
$threads = 0;
$master = curl_multi_init();
$curl_opts = array(CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 5,
CURLOPT_CONNECTTIMEOUT => 15,
CURLOPT_TIMEOUT => 15,
CURLOPT_RETURNTRANSFER => TRUE);
$results = array();
$count = 0;
foreach($url_array as $url) {
$ch = curl_init();
$curl_opts = [CURLOPT_URL => $url];
curl_setopt_array($ch, $curl_opts);
curl_multi_add_handle($master, $ch); //push URL for single rec send into curl stack
$results[$count] = array("url" => $url, "handle" => $ch);
$threads++;
$count++;
if($threads >= $thread_width) { //start running when stack is full to width
while($threads >= $thread_width) {
//usleep(100);
while(($execrun = curl_multi_exec($master, $running)) === -1){}
curl_multi_select($master);
// a request was just completed - find out which one and remove it from stack
while($done = curl_multi_info_read($master)) {
foreach($results as &$res) {
if($res['handle'] == $done['handle']) {
$res['result'] = curl_multi_getcontent($done['handle']);
}
}
curl_multi_remove_handle($master, $done['handle']);
curl_close($done['handle']);
$threads--;
}
}
}
}
do { //finish sending remaining queue items when all have been added to curl
//usleep(100);
while(($execrun = curl_multi_exec($master, $running)) === -1){}
curl_multi_select($master);
while($done = curl_multi_info_read($master)) {
foreach($results as &$res) {
if($res['handle'] == $done['handle']) {
$res['result'] = curl_multi_getcontent($done['handle']);
}
}
curl_multi_remove_handle($master, $done['handle']);
curl_close($done['handle']);
$threads--;
}
} while($running > 0);
curl_multi_close($master);
return $results;
}
and here is single-threaded curl function.
function runSingleRequests($url_array) {
foreach($url_array as $url) {
// Initialize a CURL session.
$ch = curl_init();
// Page contents not needed.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
// grab URL and pass it to the variable.
curl_setopt($ch, CURLOPT_URL, $url);
// process the request.
$result = curl_exec($ch);
}
Both take an array of URLs as their input.
I currently have an array of all single tasks and another array of all multiple tasks with a "header id" that lets me know what header task each detail task is part of.
Any help on theory or code would be most appreciated.
Thanks!
Why don't you use a rudementary task scheduler to schedule your requests and followups, instead of running everything at once?
See it in action: https://ideone.com/suTUBS
<?php
class Task
{
protected $follow_up = [];
protected $task_callback;
public function __construct($task_callback)
{
$this->task_callback = $task_callback;
}
public function addFollowUp(Task $follow_up)
{
$this->follow_up[] = $follow_up;
}
public function complete()
{
foreach($this->follow_up as $runnable) {
$runnable->run();
}
}
public function run()
{
$callback = $this->task_callback;
$callback($this);
}
}
$provided_task_scheduler_from_somewhere = function()
{
$tasks = [];
$global_message_thing = 'failed';
$second_global_message_thing = 'failed';
$task1 = new Task(function (Task $runner)
{
$something_in_closure = function() use ($runner) {
echo "running task one\n";
$runner->complete();
};
$something_in_closure();
});
/**
* use $global_message_thing as reference so we can manipulate it
* This will make sure that the follow up on this one knows the status of what happened here
*/
$second_follow_up = new Task(function(Task $runner) use (&$global_message_thing)
{
echo "second follow up on task one.\n";
$global_message_thing = "success";
$runner->complete();
});
/**
* Just doing things in random order to show that order doesn't really matter with a task scheduler
* just the follow ups
*/
$tasks[] = $task1;
$tasks[] = new Task(function(Task $runner)
{
echo "running task 2\n";
$runner->complete();
});
$task1->addFollowUp(new Task(function(Task $runner)
{
echo "follow up on task one.\n";
$runner->complete();
}));
$task1->addFollowUp($second_follow_up);
/**
* Adding the references to our "status" trackers here to know what to print
* One will still be on failed because we did nothing with it. this way we know it works properly
* as a control.
*/
$second_follow_up->addFollowUp(new Task(function(Task $runner) use (&$global_message_thing, &$second_global_message_thing) {
if($global_message_thing === "success") {
echo "follow up on the second follow up, three layers now, w00007!\n";
}
if($second_global_message_thing === "success") {
echo "you don't see this\n";
}
$runner->complete();
}));
return $tasks;
};
/**
* Normally you'd use some aggretating function to build up your tasks
* list or a collection of classes. I simulated that here with this callback function.
*/
$tasks = $provided_task_scheduler_from_somewhere();
foreach($tasks as $task) {
$task->run();
}
This way you can have nesting of tasks that need to follow after each other, with some clever uses of closures you can pass parameters to the executing functions and the encompassing objects outside it.
In my example the Task object itself is passed to the executing function so the executing function can call complete when it's done with it's job.
When complete is called the Task determine if it has scheduled follow up tasks to execute and if so, those are automatically called and works itself down the chain like that.
It's a rudimentary task scheduler, but it should help you on the way getting steps planned in the order you want them to be executed.
Here's an easier to follow example, From : http://arguments.callee.info/2010/02/21/multiple-curl-requests-with-php/
curl_multi_init. This family of functions allows you to combine cURL handles and execute them simultaneously.
EXAMPLE
build the individual requests, but do not execute them
$ch_1 = curl_init('http://webservice.one.com/');
$ch_2 = curl_init('http://webservice.two.com/');
curl_setopt($ch_1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch_2, CURLOPT_RETURNTRANSFER, true);
build the multi-curl handle, adding both $ch
$mh = curl_multi_init();
curl_multi_add_handle($mh, $ch_1);
curl_multi_add_handle($mh, $ch_2);
execute all queries simultaneously, and continue when all are complete
$running = null;
do {
curl_multi_exec($mh, $running);
} while ($running);
close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
all of our requests are done, we can now access the results
$response_1 = curl_multi_getcontent($ch_1);
$response_2 = curl_multi_getcontent($ch_2);
echo "$response_1 $response_2"; // output results
If both websites take one second to return, we literally cut our page load time in half by using the second example instead of the first!
Referances : https://www.php.net/manual/en/function.curl-multi-init.php
I am trying to use streamedResponse to output progress to my index page in Symfony2.
This code below does show my progress on the api calls as it occurs, but I am having trouble rendering the streamed information in an actual view. Right now it is just outputing plain text on the top of the page, then rendering the view when its all complete.
I don't want to return the final array and close the function until everything is loaded, but I can't seem to get a regular twig template to show while I output the progress.
I have tried using render but nothing seems to truly ouput that view file to the screen unless I return.
public function indexAction($countryCode)
{
//anywhere from five to fifteen api calls are going to take place
foreach ($Widgets as $Widget) {
$response = new StreamedResponse();
$curlerUrl = $Widget->getApiUrl()
. '?action=returnWidgets'
. '&data=' . urlencode(serialize(array(
'countryCode' => $countryCode
)));
$requestStartTime = microtime(true);
$curler = $this->get('curler')->curlAUrl($curlerUrl);
$curlResult = json_decode($curler['body'], true);
if(isset($curlResult['data'])){
//do some processing on the data
}
$response->setCallback(function() use ($Widget, $executionTime) {
flush();
sleep(1);
var_dump($Widget->getName());
var_dump($executionTime);
flush();
});
$response->send();
}
//rest of indexAction with a return statement
return array(
//all the vars my template will need
);
}
Also, another important detail is that I am trying to render all to twig and there seems to be some interesting issues with that.
As I understand it, you only get one chance to output something to the browser from the server (PHP/Twig), then it's up to JavaScript to make any further changes (like update a progress bar).
I'd recommend using multi-cURL to perform all 15 requests asynchronously. This effectively makes the total request time equal to the slowest request so you can serve your page much faster and maybe eliminate the need for the progress bar.
// Create the multiple cURL handle
$mh = curl_multi_init();
$handles = array();
$responses = array();
// Create and add the cURL handles to the $mh
foreach($widgets as $widget) {
$ch = $curler->getHandle($widget->getURL()); // Code that returns a cURL handle
$handles[] = $ch;
curl_multi_add_handle($mh, $ch);
}
// Execute the requests
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
// Get the request content
foreach($handles as $handle) {
$responses[] = curl_multi_getcontent($handle);
// Close the handles
curl_close($handle);
}
curl_multi_close();
// Do something with the responses
// ...
Ideally, this would be a method of your Curler service.
public function processHandles(array $widgets)
{
// most of the above
return $responses;
}
You may implements all of the logic in the setCallback method, so consider this code:
public function indexAction($countryCode)
{
$Widgets = [];
$response = new StreamedResponse();
$curlerService = $this->get('curler');
$response->setCallback(function() use ($Widgets, $curlerService, $countryCode) {
foreach ($Widgets as $Widget) {
$curlerUrl = $Widget->getApiUrl()
. '?action=returnWidgets'
. '&data=' . urlencode(serialize(array(
'countryCode' => $countryCode
)));
$requestStartTime = microtime(true);
$curler = $curlerService->curlAUrl($curlerUrl);
$curlResult = json_decode($curler['body'], true);
if(isset($curlResult['data'])){
//do some processing on the data
}
flush();
sleep(1);
var_dump($Widget->getName());
var_dump( (microtime(true) - $requestStartTime) );
flush();
}
});
// Directly return the streamed response object
return $response;
}
Further reading this and this article.
Hope this help
How should I multithread some php-cli code that needs a timeout?
I'm using PHP 5.6 on Centos 6.6 from the command line.
I'm not very familiar with multithreading terminology or code. I'll simplify the code here but it is 100% representative of what I want to do.
The non-threaded code currently looks something like this:
$datasets = MyLibrary::getAllRawDataFromDBasArrays();
foreach ($datasets as $dataset) {
MyLibrary::processRawDataAndStoreResultInDB($dataset);
}
exit; // just for clarity
I need to prefetch all my datasets, and each processRawDataAndStoreResultInDB() cannot fetch it's own dataset. Sometimes processRawDataAndStoreResultInDB() takes too long to process a dataset, so I want to limit the amount of time it has to process it.
So you can see that making it multithreaded would
Speed it up by allowing multiple processRawDataAndStoreResultInDB() to execute at the same time
Use set_time_limit() to limit the amount of time each one has to process each dataset
Notice that I don't need to come back to my main program. Since this is a simplification, you can trust that I don't want to collect all the processed datasets and do a single save into the DB after they are all done.
I'd like to do something like:
class MyWorkerThread extends SomeThreadType {
public function __construct($timeout, $dataset) {
$this->timeout = $timeout;
$this->dataset = $dataset;
}
public function run() {
set_time_limit($this->timeout);
MyLibrary::processRawDataAndStoreResultInDB($this->dataset);
}
}
$numberOfThreads = 4;
$pool = somePoolClass($numberOfThreads);
$pool->start();
$datasets = MyLibrary::getAllRawDataFromDBasArrays();
$timeoutForEachThread = 5; // seconds
foreach ($datasets as $dataset) {
$thread = new MyWorkerThread($timeoutForEachThread, $dataset);
$thread->addCallbackOnTerminated(function() {
if ($this->isTimeout()) {
MyLibrary::saveBadDatasetToDb($dataset);
}
}
$pool->addToQueue($thread);
}
$pool->waitUntilAllWorkersAreFinished();
exit; // for clarity
From my research online I've found the PHP extension pthreads which I can use with my thread-safe php CLI, or I could use the PCNTL extension or a wrapper library around it (say, Arara/Process)
https://github.com/krakjoe/pthreads (and the example directory)
https://github.com/Arara/Process (pcntl wrapper)
When I look at them and their examples though (especially the pthreads pool example) I get confused quickly by the terminology and which classes I should use to achieve the kind of multithreading I'm looking for.
I even wouldn't mind creating the pool class myself, if I had a isRunning(), isTerminated(), getTerminationStatus() and execute() function on a thread class, as it would be a simple queue.
Can someone with more experience please direct me to which library, classes and functions I should be using to map to my example above? Am I taking the wrong approach completely?
Thanks in advance.
Here comes an example using worker processes. I'm using the pcntl extension.
/**
* Spawns a worker process and returns it pid or -1
* if something goes wrong.
*
* #param callback function, closure or method to call
* #return integer
*/
function worker($callback) {
$pid = pcntl_fork();
if($pid === 0) {
// Child process
exit($callback());
} else {
// Main process or an error
return $pid;
}
}
$datasets = array(
array('test', '123'),
array('foo', 'bar')
);
$maxWorkers = 1;
$numWorkers = 0;
foreach($datasets as $dataset) {
$pid = worker(function () use ($dataset) {
// Do DB stuff here
var_dump($dataset);
return 0;
});
if($pid !== -1) {
$numWorkers++;
} else {
// Handle fork errors here
echo 'Failed to spawn worker';
}
// If $maxWorkers is reached we need to wait
// for at least one child to return
if($numWorkers === $maxWorkers) {
// $status is passed by reference
$pid = pcntl_wait($status);
echo "child process $pid returned $status\n";
$numWorkers--;
}
}
// (Non blocking) wait for the remaining childs
while(true) {
// $status is passed by reference
$pid = pcntl_wait($status, WNOHANG);
if(is_null($pid) || $pid === -1) {
break;
}
if($pid === 0) {
// Be patient ...
usleep(50000);
continue;
}
echo "child process $pid returned $status\n";
}
my web app requires making 7 different soap wsdl api requests to complete one task (I need the users to wait for the result of all the requests). The avg response time is 500 ms to 1.7 second for each request. I need to run all these request in parallel to speed up the process.
What's the best way to do that:
pthreads or
Gearman workers
fork process
curl multi (i have to build the xml soap body)
Well the first thing to say is, it's never really a good idea to create threads in direct response to a web request, think about how far that will actually scale.
If you create 7 threads for everyone that comes along and 100 people turn up, you'll be asking your hardware to execute 700 threads concurrently, which is quite a lot to ask of anything really...
However, scalability is not something I can usefully help you with, so I'll just answer the question.
<?php
/* the first service I could find that worked without authorization */
define("WSDL", "http://www.webservicex.net/uklocation.asmx?WSDL");
class CountyData {
/* this works around simplexmlelements being unsafe (and shit) */
public function __construct(SimpleXMLElement $element) {
$this->town = (string)$element->Town;
$this->code = (string)$element->PostCode;
}
public function run(){}
protected $town;
protected $code;
}
class GetCountyData extends Thread {
public function __construct($county) {
$this->county = $county;
}
public function run() {
$soap = new SoapClient(WSDL);
$result = $soap->getUkLocationByCounty(array(
"County" => $this->county
));
foreach (simplexml_load_string(
$result->GetUKLocationByCountyResult) as $element) {
$this[] = new CountyData($element);
}
}
protected $county;
}
$threads = [];
$thread = 0;
$threaded = true; # change to false to test without threading
$counties = [ # will create as many threads as there are counties
"Buckinghamshire",
"Berkshire",
"Yorkshire",
"London",
"Kent",
"Sussex",
"Essex"
];
while ($thread < count($counties)) {
$threads[$thread] =
new GetCountyData($counties[$thread]);
if ($threaded) {
$threads[$thread]->start();
} else $threads[$thread]->run();
$thread++;
}
if ($threaded)
foreach ($threads as $thread)
$thread->join();
foreach ($threads as $county => $data) {
printf(
"Data for %s %d\n", $counties[$county], count($data));
}
?>
Note that, the SoapClient instance is not, and can not be shared, this may well slow you down, you might want to enable caching of wsdl's ...
I am using curl_multi to send out emails out in a rolling curl script similar to this one but i added a curlopt_timeout of 10 seconds and a curlopt_connecttimeout of 20 seconds
http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
while testing it i reduced the timeouts to 1ms by using timeout_ms and connecttimeout_ms respectively, just to see how it handles a timeout. But the timeout kills the entire curl process. Is there a way to continue with the other threads even if one times out??
Thanks.
-devo
https://github.com/krakjoe/pthreads
<?php
class Possibilities extends Thread {
public function __construct($url){
$this->url = $url;
}
public function run(){
/*
* Or use curl, this is quicker to make an example ...
*/
return file_get_contents($this->url);
}
}
$threads = array();
$urls = get_my_urls_from_somewhere();
foreach($urls as $index => $url){
$threads[$index]=new Possibilities($url);
$threads[$index]->start();
}
foreach($threads as $index => $thread ){
if( ( $response = $threads[$index]->join() ) ){
/** good, got a response */
} else { /** we do not care **/ }
}
?>
My guess is, you are using curl multi as it's the only option for concurrent execution of the code sending out emails ... if this is the case, I do not suggest that you use anything like the code above, I suggest that you thread the calls to mail() directly as this will be faster and more efficient by far.
But now you know, you can thread in PHP .. enjoy :)