How to process GuzzleHTTP async requests without blocking? - php

I need to write a processor that can potentially send out many HTTP requests to an external service. Since I want to maximize performance, I wish to minimize blocking. I'm using PHP 5.6 and GuzzleHTTP.
GuzzleHTTP does have an option for async requests. But since we do have only 1 thread available in PHP, I need to allocate some time for them to be processed. Unfortunately I only see one way to do it - calling wait which blocks until all the requests are processed. That's not what I want.
Instead I'd like to have some method that handles whatever has arrived, and then returns. So that I can do something along the lines of:
$allRequests = [];
while ( !checkIfNeedToEnd() ) {
$newItems = getItemsFromQueue();
$allRequests = $allRequests + spawnRequests($newItems);
GuzzleHttp::processWhatYouCan($allRequests);
removeProcessedRequests($allRequests);
}
Is this possible?

Alright... figured it out myself:
$handler = new \GuzzleHttp\Handler\CurlMultiHandler();
$client = new \GuzzleHttp\Client(['handler' => $handler]);
$promise1 = $client->getAsync("http://www.stackoverflow.com");
$promise2 = $client->getAsync("http://localhost/");
$doneCount = 0;
$promise1->then(function() use(&$doneCount) {
$doneCount++;
echo 'Promise 1 done!';
});
$promise2->then(function() use(&$doneCount) {
$doneCount++;
echo 'Promise 2 done!';
});
$last = microtime(true);
while ( $doneCount < 2 ) {
$now = microtime(true);
$delta = round(($now-$last)*1000);
echo "tick($delta) ";
$last = $now;
$handler->tick();
}
And the output I get is:
tick(0) tick(6) tick(1) tick(0) tick(1001) tick(10) tick(96) Promise 2 done!tick(97) Promise 1 done!
The magic ingredient is creating the CurlMultiHandler yoursef and then calling tick() on that when it's convenient. After that it's promises as usual. And if the queue is empty, tick() returns immediately.
Note that it can still block for up to 1 second (default) if there is no activity. This can be also changed if needed:
$handler = new \GuzzleHttp\Handler\CurlMultiHandler(['select_timeout' => 0.5]);
The value is in seconds, but with floating point.

Related

Restrict function to maximum 100 executions per minute

I have a script that makes multiple POST requests to an API. Rough outline of the script is as follows:
define("MAX_REQUESTS_PER_MINUTE", 100);
function apirequest ($data) {
// post data using cURL
}
while ($data = getdata ()) {
apirequest($data);
}
The API is throttled, it allows users to post up to 100 requests per minute. Additional requests return HTTP error + Retry-After response until the window resets. Note that the server can take anywhere between 100 milliseconds to 100 seconds to process the request.
I need to make sure that my function does not execute more than 100 times per minute. I have tried usleep function to introduce a constant delay of 0.66 seconds but this simply adds one extra minute per minute. An arbitrary value such as 0.1 second results in error one time or another. I log all requests inside a database table along with time, the other solution I used is to probe the table and count the number of requests made within last 60 seconds.
I need a solution that wastes as little time as possible.
I've put Derek's suggestion into code.
class Throttler {
private $maxRequestsPerMinute;
private $getdata;
private $apirequest;
private $firstRequestTime = null;
private $requestCount = 0;
public function __construct(
int $maxRequestsPerMinute,
$getdata,
$apirequest
) {
$this->maxRequestsPerMinute = $maxRequestsPerMinute;
$this->getdata = $getdata;
$this->apirequest = $apirequest;
}
public function run() {
while ($data = call_user_func($this->getdata)) {
if ($this->requestCount >= $this->maxRequestsPerMinute) {
sleep(ceil($this->firstRequestTime + 60 - microtime(true)));
$this->firstRequestTime = null;
$this->requestCount = 0;
}
if ($this->firstRequestTime === null) {
$this->firstRequestTime = microtime(true);
}
++$this->requestCount;
call_user_func($this->apirequest, $data);
}
}
}
$throttler = new Throttler(100, 'getdata', 'apirequest');
$throttler->run();
UPD. I've put its updated version on Packagist so you can use it with Composer: https://packagist.org/packages/ob-ivan/throttler
To install:
composer require ob-ivan/throttler
To use:
use Ob_Ivan\Throttler\JobInterface;
use Ob_Ivan\Throttler\Throttler;
class SalmanJob implements JobInterface {
private $data;
public function next(): bool {
$this->data = getdata();
return (bool)$this->data;
}
public function execute() {
apirequest($this->data);
}
}
$throttler = new Throttler(100, 60);
$throttler->run(new SalmanJob());
Please note there are other packages providing the same functionality (I haven't tested any of them):
https://packagist.org/packages/franzip/throttler
https://packagist.org/packages/andrey-mashukov/throttler
https://packagist.org/packages/queryyetsimple/throttler
I would start by recording initial time when first request is to be made and then count how many requests are being made. Once 60 requests have been made make sure the current time is at least 1 minute after initial time. If not usleep for however long is left until minute is reached. When minute is reached reset count and initial time value.
Here is my go at this:
define("MAX_REQUESTS_PER_MINUTE", 100);
function apirequest() {
static $startingTime;
static $requestCount;
if ($startingTime === null) {
$startingTime = time();
}
if ($requestCount === null) {
$requestCount = 0;
}
$consumedTime = time() - $startingTime;
if ($consumedTime >= 60) {
$startingTime = time();
$requestCount = 0;
} elseif ($requestCount === MAX_REQUESTS_PER_MINUTE) {
sleep(60 - $consumedTime);
$startingTime = time();
$requestCount = 0;
}
$requestCount++;
echo sprintf("Request %3d, Range [%d, %d)", $requestCount, $startingTime, $startingTime + 60) . PHP_EOL;
file_get_contents("http://localhost/apirequest.php");
// the above script sleeps for 200-400ms
}
for ($i = 0; $i < 1000; $i++) {
apirequest();
}
I've tried the naive solutions of static sleeps, counting requests, and doing simple math but they tended to be quite inaccurate, unreliable, and generally introduced far more sleeping that was necessary when they could have been doing work. What you want is something that only starts issuing consequential sleeps when you're approaching your rate-limit.
Lifting my solution from a previous problem for those sweet, sweet internet points:
I used some math to figure out a function that would sleep for the correct sum of time over the given request, and allow me to ramp it up exponentially towards the end.
If we express the sleep as:
y = e^( (x-A)/B )
where A and B are arbitrary values controlling the shape of the curve, then the sum of all sleeps, M, from 0 to N requests would be:
M = 0∫N e^( (x-A)/B ) dx
This is equivalent to:
M = B * e^(-A/B) * ( e^(N/B) - 1 )
and can be solved with respect to A as:
A = B * ln( -1 * (B - B * e^(N/B)) / M )
While solving for B would be far more useful, since specifying A lets you define a what point the graph ramps up aggressively, the solution to that is mathematically complex, and I've not been able to solve it myself or find anyone else that can.
/**
* #param int $period M, window size in seconds
* #param int $limit N, number of requests permitted in the window
* #param int $used x, current request number
* #param int $bias B, "bias" value
*/
protected static function ratelimit($period, $limit, $used, $bias=20) {
$period = $period * pow(10,6);
$sleep = pow(M_E, ($used - self::biasCoeff($period, $limit, $bias))/$bias);
usleep($sleep);
}
protected static function biasCoeff($period, $limit, $bias) {
$key = sprintf('%s-%s-%s', $period, $limit, $bias);
if( ! key_exists($key, self::$_bcache) ) {
self::$_bcache[$key] = $bias * log( -1 * ( ($bias - $bias * pow(M_E, $limit/$bias)) / $period ) );
}
return self::$_bcache[$key];
}
With a bit of tinkering I've found that B = 20 seems to be a decent default, though I have no mathematical basis for it. Something something slope mumble mumble exponential bs bs.
Also, if anyone wants to solve that equation for B for me I've got a bounty up on math.stackexchange.
Though I believe that our situations differ slightly in that my API provider's responses all included the number of available API calls, and the number still remaining within the window. You may need additional code to track this on your side instead.

Is there another way to implements Long Polling in PHP

I have read some articles (like this, or this), and all of them give me the same way to implements Long Polling in PHP (using usleep() and loop), like that:
$source; // some data source - db, etc
$data = null; // our return data
$timeout = 30; // timeout in seconds
$now = time(); // start time
// loop for $timeout seconds from $now until we get $data
while((time() - $now) < $timeout) {
// fetch $data
$data = $source->getData();
// if we got $data, break the loop
if (!empty($data)) break;
// wait 1 sec to check for new $data
usleep(10000);
}
// if there is no $data, tell the client to re-request (arbitrary status message)
if (empty($data)) $data = array('status'=>'no-data');
// send $data response to client
echo json_encode($data);
Is there another way? I know that PHP is a script language only, but i would like a way that base on event rather than checking and doing or waiting until timeout. It maybe be something like Continuations in Java that would be perfect.
You could try React: http://reactphp.org/
Is not very mature yet, but it may suit your needs. Instead of doing long pooling, you can do it async.
I would recommend: http://ape-project.org/
mature and scalable

What would cause fastcgi_finish_request to take several seconds to execute?

I have run into a rather strange issue with a particular part of a large PHP application. The portion of the application in question loads data from MySQL (mostly integer data) and builds a JSON string which gets output to the browser. These requests were taking sevar seconds (8 - 10 seconds each) in Chrome's developer tools as well as via curl. However the PHP shutdown handler I had reported that the requests were executing in less than 1 second.
In order to debug I added a call to fastcgi_finish_request(), and suddenly my shutdown handler reported the same time as Chrome / curl.
With some debugging, I narrowed it down to a particular function. I created the following simple test case:
<?php
$start_time = DFStdLib::exec_time();
$product = new ApparelQuoteProduct(19);
$pmatrix = $product->productMatrix();
// This function call is the problem:
$ranges = $pmatrix->ranges();
$end_time = DFStdLib::exec_time();
$duration = $end_time - $start_time;
echo "Output generation duration was: $duration sec";
fastcgi_finish_request();
$fastcgi_finish_request_duration = DFStdLib::exec_time() - $end_time;
DFSkel::log(DFSkel::LOG_INFO,"Output generation duration was: $duration sec; fastcgi_finish_request Duration was: $fastcgi_finish_request_duration sec");
If I call $pmatrix->ranges() (which is a function that executes a number of calls to mysql_query to fetch data and build an in-memory PHP object structure from that data) then I get the output:
Output generation duration was: 0.2563910484314 sec; fastcgi_finish_request Duration was: 7.3854329586029 sec
in my log file. Note that the call to $pmatrix->ranges() does not take long at all, yet somehow it causes the PHP FastCGI handler to take seven seconds to fihish the request. (This is true even if I don't call fastcgi_finish_request -- the browser takes 7-8 seconds to display the data either way)
If I comment out the call to $pmatrix->ranges() I get:
Output generation duration was: 0.0016419887542725 sec; fastcgi_finish_request Duration was: 0.00035214424133301 sec
I can post the entire source for the $pmatrix->ranges() function, but it's very long. I'd like some advice on where to even start looking.
What is it about the PHP FastCGI request process which would even cause such behavior? Does it call destructor functions / garbage collection? Does it close open resources? How can I troubleshoot this further?
EDIT: Here's a larger source sample:
<?php
class ApparelQuote_ProductPricingMatrix_TestCase
{
protected $myProductId;
protected $myQuantityRanges;
private $myProduct;
protected $myColors;
protected $mySizes;
protected $myQuantityPricing;
public function __construct($product)
{
$this->myProductId = intval($product);
}
/**
* Return an array of all ranges for this matrix.
*
* #return array
*/
public function ranges()
{
$this->myLoadPricing();
return $this->myQuantityRanges;
}
protected function myLoadPricing($force=false)
{
if($force || !$this->myQuantityPricing)
{
$this->myColors = array();
$this->mySizes = array();
$priceRec_finder = new ApparelQuote_ProductPricingRecord();
$priceRec_finder->_link = Module_ApparelQuote::dbLink();
$found_recs = $priceRec_finder->find(_ALL,"`product_id`={$this->myProductId}","`qtyrange_id`,`color_id`");
$qtyFinder = new ApparelQuote_ProductPricingQtyRange();
$qtyFinder->_link = Module_ApparelQuote::dbLink();
$this->myQuantityRanges = $qtyFinder->find(_ALL,"`product_id`=$this->myProductId");
$this->myQuantityPricing = array();
foreach ($found_recs as &$r)
{
if(false) $r = new ApparelQuote_ProductPricingRecord();
if(!isset($this->myColors[$r->color_id]))
$this->myColors[$r->color_id] = true;
if(!isset($this->mySizes[$r->size_id]))
$this->mySizes[$r->size_id] = true;
if(!is_array($this->myQuantityPricing[$r->qtyrange_id]))
$this->myQuantityPricing[$r->qtyrange_id] = array();
if(!is_array($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id] = array();
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id][$r->size_id] = &$r;
}
$this->myColors = array_keys($this->myColors);
$this->mySizes = array_keys($this->mySizes);
}
}
}
$start_time = DFStdLib::exec_time();
$pmatrix = new ApparelQuote_ProductPricingMatrix_TestCase(19);
$ranges = $pmatrix->ranges();
$end_time = DFStdLib::exec_time();
$duration = $end_time - $start_time;
echo "Output generation duration was: $duration sec";
fastcgi_finish_request();
$fastcgi_finish_request_duration = DFStdLib::exec_time() - $end_time;
DFSkel::log(DFSkel::LOG_INFO,"Output generation duration was: $duration sec; fastcgi_finish_request Duration was: $fastcgi_finish_request_duration sec");
Upon continued debugging I have narrowed down to the following lines from the above:
if(!is_array($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id] = array();
These statements are building an in-memory array structure of all the data loaded from MySQL. If I comment these out, then fastcgi_finish_request takes roughly 0.0001 seconds to run. If I do not comment them out, then fastcgi_finish_request takes 7+ seconds to run.
It's actually the function call to is_array that's the issue here
Changing to:
if(!isset($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
Resolves the problem. Why is this?

pthreads stopping already running thread once a condition has been met

I'm still relatively new to PHP and trying to use pthreads to solve an issue. I have 20 threads running processes that end at varying times. Most finish around < 10 seconds or so. I don't need all 20, just 10 detected. Once I get to 10, I would like to kill the threads, or to continue on to the next step.
I have tried using set_time_limit to about 20 seconds for each of the threads, but they ignore it and keep running. I am looping through the jobs looking for the join because I didn't want the rest of the program to run but I'm stuck until the slowest one has finished. While pthreads has reduced the time from around a minute to about 30 seconds, I can shave even more time since the first 10 run in about 3 seconds.
Thanks for any help and here is my code:
$count = 0;
foreach ( $array as $i ) {
$imgName = $this->smsId."_$count.jpg";
$name = "LocalCDN/".$imgName;
$stack[] = new AsyncImageModify($i['largePic'], $name);
$count++;
}
// Run the threads
foreach ( $stack as $t ) {
$t->start();
}
// Check if the threads have finished; push the coordinates into an array
foreach ( $stack as $t ) {
if($t->join()){
array_push($this->imgArray, $t->data);
}
}
class class AsyncImageModify extends \Thread{
public $data;
public function __construct($arg, $name, $container) {
$this->arg = $arg;
$this->name = $name;
}
public function run() {
//tried putting the set_time_limit() here, didn't work
if ($this->arg) {
// Get the image
$didWeGetTheImage = Image::getImage($this->arg, $this->name);
if($didWeGetTheImage){
$timestamp1 = microtime(true);
print_r("Starting face detection $this->arg" . "\n");
print_r(" ");
$j = Image::process1($this->name);
if($j){
// lets go ahead and do our image manipulation at this point
$userPic = Image::process2($this->name, $this->name, 200, 200, false, $this->name, $j);
if($userPic){
$this->data = $userPic;
print_r("Back from process2; the image returned is $userPic");
}
}
$endTime = microtime(true);
$td = $endTime-$timestamp1;
print_r("Finished face detection $this->arg in $td seconds" . "\n");
print_r($j);
}
}
}
It is difficult to guess the functionality of Image::* methods, so I can't really answer in any detail.
What I can say, is that there are very few machines I can think of that are suitable to run 20 concurrent threads in any case. A more suitable setup would be the worker/stackable model. A Worker thread is a reuseable context, and can execute task after task, implemented as Stackables; execution in a multi-threaded environment should always use the least amount of threads to get the most work done possible.
Please see pooling example and other examples that are distributed with pthreads, available on github, additionally, much information regarding usage is contained in past bug reports, if you are still struggling after that ...

determining proper gearman task function to retrieve real-time job status

Very simply, I have a program that needs to perform a large process (anywhere from 5 seconds to several minutes) and I don't want to make my page wait for the process to finish to load.
I understand that I need to run this gearman job as a background process but I'm struggling to identify the proper solution to get real-time status updates as to when the worker actually finishes the process. I've used the following code snippet from the PHP examples:
do {
sleep(3);
$stat = $gmclient->jobStatus($job_handle);
if (!$stat[0]) // the job is known so it is not done
$done = true;
echo "Running: " . ($stat[1] ? "true" : "false") . ", numerator: " . $stat[2] . ", denomintor: " . $stat[3] . "\n";
} while(!$done);
echo "done!\n";
and this works, however it appears that it just returns data to the client when the worker finished telling the job what to do. Instead I want to know when the literal process of the job finished.
My real-life example:
Pull several data feeds from an API (some feeds take longer than others)
Load a couple of the ones that always load fast, place a "Waiting/Loading" animation on the section that was sent off to a worker queue
When the work is done and the results have been completely retrieved, replace the animation with the results
This is a bit late, but I stumbled across this question looking for the same answer. I was able to get a solution together, so maybe it will help someone else.
For starters, refer to the documentation on GearmanClient::jobStatus. This will be called from the client, and the function accepts a single argument: $job_handle. You retrieve this handle when you dispatch the request:
$client = new GearmanClient( );
$client->addServer( '127.0.0.1', 4730 );
$handle = $client->doBackground( 'serviceRequest', $data );
Later on, you can retrieve the status by calling the jobStatus function on the same $client object:
$status = $client->jobStatus( $handle );
This is only meaningful, though, if you actually change the status from within your worker with the sendStatus method:
$worker = new GearmanWorker( );
$worker->addFunction( 'serviceRequest', function( $job ) {
$max = 10;
// Set initial status - numerator / denominator
$job->sendStatus( 0, $max );
for( $i = 1; $i <= $max; $i++ ) {
sleep( 2 ); // Simulate a long running task
$job->sendStatus( $i, $max );
}
return GEARMAN_SUCCESS;
} );
while( $worker->work( ) ) {
$worker->wait( );
}
In versions of Gearman prior to 0.5, you would use the GearmanJob::status method to set the status of a job. Versions 0.6 to current (1.1) use the methods above.
See also this question: Problem With Gearman Job Status

Categories