I have a script that makes multiple POST requests to an API. Rough outline of the script is as follows:
define("MAX_REQUESTS_PER_MINUTE", 100);
function apirequest ($data) {
// post data using cURL
}
while ($data = getdata ()) {
apirequest($data);
}
The API is throttled, it allows users to post up to 100 requests per minute. Additional requests return HTTP error + Retry-After response until the window resets. Note that the server can take anywhere between 100 milliseconds to 100 seconds to process the request.
I need to make sure that my function does not execute more than 100 times per minute. I have tried usleep function to introduce a constant delay of 0.66 seconds but this simply adds one extra minute per minute. An arbitrary value such as 0.1 second results in error one time or another. I log all requests inside a database table along with time, the other solution I used is to probe the table and count the number of requests made within last 60 seconds.
I need a solution that wastes as little time as possible.
I've put Derek's suggestion into code.
class Throttler {
private $maxRequestsPerMinute;
private $getdata;
private $apirequest;
private $firstRequestTime = null;
private $requestCount = 0;
public function __construct(
int $maxRequestsPerMinute,
$getdata,
$apirequest
) {
$this->maxRequestsPerMinute = $maxRequestsPerMinute;
$this->getdata = $getdata;
$this->apirequest = $apirequest;
}
public function run() {
while ($data = call_user_func($this->getdata)) {
if ($this->requestCount >= $this->maxRequestsPerMinute) {
sleep(ceil($this->firstRequestTime + 60 - microtime(true)));
$this->firstRequestTime = null;
$this->requestCount = 0;
}
if ($this->firstRequestTime === null) {
$this->firstRequestTime = microtime(true);
}
++$this->requestCount;
call_user_func($this->apirequest, $data);
}
}
}
$throttler = new Throttler(100, 'getdata', 'apirequest');
$throttler->run();
UPD. I've put its updated version on Packagist so you can use it with Composer: https://packagist.org/packages/ob-ivan/throttler
To install:
composer require ob-ivan/throttler
To use:
use Ob_Ivan\Throttler\JobInterface;
use Ob_Ivan\Throttler\Throttler;
class SalmanJob implements JobInterface {
private $data;
public function next(): bool {
$this->data = getdata();
return (bool)$this->data;
}
public function execute() {
apirequest($this->data);
}
}
$throttler = new Throttler(100, 60);
$throttler->run(new SalmanJob());
Please note there are other packages providing the same functionality (I haven't tested any of them):
https://packagist.org/packages/franzip/throttler
https://packagist.org/packages/andrey-mashukov/throttler
https://packagist.org/packages/queryyetsimple/throttler
I would start by recording initial time when first request is to be made and then count how many requests are being made. Once 60 requests have been made make sure the current time is at least 1 minute after initial time. If not usleep for however long is left until minute is reached. When minute is reached reset count and initial time value.
Here is my go at this:
define("MAX_REQUESTS_PER_MINUTE", 100);
function apirequest() {
static $startingTime;
static $requestCount;
if ($startingTime === null) {
$startingTime = time();
}
if ($requestCount === null) {
$requestCount = 0;
}
$consumedTime = time() - $startingTime;
if ($consumedTime >= 60) {
$startingTime = time();
$requestCount = 0;
} elseif ($requestCount === MAX_REQUESTS_PER_MINUTE) {
sleep(60 - $consumedTime);
$startingTime = time();
$requestCount = 0;
}
$requestCount++;
echo sprintf("Request %3d, Range [%d, %d)", $requestCount, $startingTime, $startingTime + 60) . PHP_EOL;
file_get_contents("http://localhost/apirequest.php");
// the above script sleeps for 200-400ms
}
for ($i = 0; $i < 1000; $i++) {
apirequest();
}
I've tried the naive solutions of static sleeps, counting requests, and doing simple math but they tended to be quite inaccurate, unreliable, and generally introduced far more sleeping that was necessary when they could have been doing work. What you want is something that only starts issuing consequential sleeps when you're approaching your rate-limit.
Lifting my solution from a previous problem for those sweet, sweet internet points:
I used some math to figure out a function that would sleep for the correct sum of time over the given request, and allow me to ramp it up exponentially towards the end.
If we express the sleep as:
y = e^( (x-A)/B )
where A and B are arbitrary values controlling the shape of the curve, then the sum of all sleeps, M, from 0 to N requests would be:
M = 0∫N e^( (x-A)/B ) dx
This is equivalent to:
M = B * e^(-A/B) * ( e^(N/B) - 1 )
and can be solved with respect to A as:
A = B * ln( -1 * (B - B * e^(N/B)) / M )
While solving for B would be far more useful, since specifying A lets you define a what point the graph ramps up aggressively, the solution to that is mathematically complex, and I've not been able to solve it myself or find anyone else that can.
/**
* #param int $period M, window size in seconds
* #param int $limit N, number of requests permitted in the window
* #param int $used x, current request number
* #param int $bias B, "bias" value
*/
protected static function ratelimit($period, $limit, $used, $bias=20) {
$period = $period * pow(10,6);
$sleep = pow(M_E, ($used - self::biasCoeff($period, $limit, $bias))/$bias);
usleep($sleep);
}
protected static function biasCoeff($period, $limit, $bias) {
$key = sprintf('%s-%s-%s', $period, $limit, $bias);
if( ! key_exists($key, self::$_bcache) ) {
self::$_bcache[$key] = $bias * log( -1 * ( ($bias - $bias * pow(M_E, $limit/$bias)) / $period ) );
}
return self::$_bcache[$key];
}
With a bit of tinkering I've found that B = 20 seems to be a decent default, though I have no mathematical basis for it. Something something slope mumble mumble exponential bs bs.
Also, if anyone wants to solve that equation for B for me I've got a bounty up on math.stackexchange.
Though I believe that our situations differ slightly in that my API provider's responses all included the number of available API calls, and the number still remaining within the window. You may need additional code to track this on your side instead.
Related
function cronProcess() {
# > 100,000 users
$users = $this->UserModel->getUsers();
foreach ($users as $user) {
# Do lots of database Insert/Update/Delete, HTTP request stuff
}
}
The problem happens when the number of users reaches ~ 100,000.
I called the function by CURL via CronTab.
So what is the best solution for this?
I do a lot of bulk tasks in CakePHP, some processing millions of records. It's certainly possible to do, the key as others suggested is small batches in a loop.
If this is something you're calling from Cron, it's probably easier to use a Shell (< v3.5) or the newer Command class (v3.6+) than cURL.
Here's generally how I paginate large batches, including some helpful optional things like a progress bar, turning off hydration to speed things up slightly, and showing how many users/second the script was able to process:
<?php
namespace App\Command;
use Cake\Console\Arguments;
use Cake\Console\Command;
use Cake\Console\ConsoleIo;
class UsersCommand extends Command
{
public function execute(Arguments $args, ConsoleIo $io)
{
// I'd guess a Finder would be a more Cake-y way of getting users than a custom "getUsers" function:
// See https://book.cakephp.org/3.0/en/orm/retrieving-data-and-resultsets.html#custom-finder-methods
$usersQuery = $this->UserModel->find('users');
// Get a total so we know how many we're gonna have to process (optional)
$total = $usersQuery->count();
if ($total === 0) {
$this->abort("No users found, stopping..");
}
// Hydration takes extra processing time & memory, which can add up in bulk. Optionally if able, skip it & work with $user as an array not an object:
$usersQuery->enableHydration(false);
$this->info("Grabbing $total users for processing");
// Optionally show the progress so we can visually see how far we are in the process
$progress = $io->helper('Progress')->init([
'total' => 10
]);
// Tune this page value to a size that solves your problem:
$limit = 1000;
$offset = 0;
// Simply drawing the progress bar every loop can slow things down, optionally draw it only every n-loops,
// this sets it to 1/5th the page size:
$progressInterval = $limit / 5;
// Optionally track the rate so we can evaluate the speed of the process, helpful tuning limit and evaluating enableHydration effects
$startTime = microtime(true);
do {
$users = $usersQuery->offset($offset)->toArray();
$count = count($users);
$index = 0;
foreach ($users as $user) {
$progress->increment(1);
// Only draw occasionally, for speed
if ($index % $progressInterval === 0) {
$progress->draw();
}
### WORK TIME
# Do your lots of database Insert/Update/Delete, HTTP request stuff etc. here
###
}
$progress->draw();
$offset += $limit; // Increment your offset to the next page
} while ($count > 0);
$totalTime = microtime(true) - $startTime;
$this->out("\nProcessed an average " . ($total / $totalTime) . " Users/sec\n");
}
}
Checkout these sections in the CakePHP Docs:
Console Commands
Command Helpers
Using Finders & Disabling Hydration
Hope this helps!
I am having a php recursive function to calculate nearest sale price. but i don't know why its run infinite time and throw error of maximum execution.
Its look like below:
function getamazonsaleper($portal)
{
$cp = floatval($this->input->post('cp')); //user provided inputs
$sp = floatval($this->input->post('sp')); //user provided input
$gst = floatval($this->input->post('gst')); //user provided input
$rfsp = floatval($this->input->post('rfsp')); //user provided input
$mcp = (int)($this->input->post('mcp')); //user provided input
$weight = floatval($this->input->post('weight')); //user provided input
$output = $this->getsalepercent($cp,$sp,$gst,$rfsp,$mcp,$weight,$portal);
return $output;
}
function getsalepercent($cp,$sp,$gst,$rfsp,$mcp,$weight,$portal) //recursive funtion
{
$spcost = ((($sp/100)*$cp));
$gstamount= (($spcost/(100+$gst))*$gst);
$rfspamount= ($spcost*($rfsp/100));
$mcpamount= ($cp*($mcp/100));
$fixedfee=$this->getfixedfee($portal,$spcost);
$weightfee=$this->getweightprice($portal,$weight);
$totalcost=$fixedfee+$weightfee+$rfspamount;
$gstinput=($totalcost*(18/100));
$remittances = $spcost-$totalcost-$gstinput;
$actualprofit= $remittances-$cp-$gstamount+$gstinput;
$actualprofitpercent = ($actualprofit/$cp)*100;
if( $actualprofitpercent >= $mcp)
{
return $sp;
}elseif($actualprofitpercent < $mcp)
{
$newsp = (int)($sp+10) ;
$this->getsalepercent($cp,$newsp,$gst,$rfsp,$mcp,$weight,$portal);
}
}
Can anybody tell me how can resolve this issue? Thanks in advance.
Edited :
Perameters
$cp=100;
$sp=200;
$mcp=20;
$weight=0.5;
$gst=28;
$rfsp=6.5;
First a couple of side notes:
- the way you use $gstinput it cancels itself out when you calculate $actualprofit (it's -$gstinput in $remittances which gets added to +$gstinput).
- $mcpamount seems to go completely unused in the code... I thought for a second you might vahe simply confused vars when doing the comparison, but of course for $cp = 100 it makes no difference.
Even so when I made a few calculations using the example values you gave for $sp = 200 (and growing by 10), I got:
Value of $actualprofit, which for $cp = 100 is also the value of $actualprofitpercent...
for $sp = 200:
43.25 - $fixedfee - $weightfee
for $sp = 210:
50.4125 - $fixedfee - $weightfee
for $sp = 220:
57.575 - $fixedfee - $weightfee
so for each $sp = $sp + 10 recursion the value of $actualprofitpercent (without taking into account $fixedfee and $weightfee) seems to grow by 7.1625.
The value of $weightfee should stay the same, but the value of $fixedfee depends on the value of $sp... Could it be that at each recursion getfixedfee() returns a value which grows faster than 7.1625?
I need to write a processor that can potentially send out many HTTP requests to an external service. Since I want to maximize performance, I wish to minimize blocking. I'm using PHP 5.6 and GuzzleHTTP.
GuzzleHTTP does have an option for async requests. But since we do have only 1 thread available in PHP, I need to allocate some time for them to be processed. Unfortunately I only see one way to do it - calling wait which blocks until all the requests are processed. That's not what I want.
Instead I'd like to have some method that handles whatever has arrived, and then returns. So that I can do something along the lines of:
$allRequests = [];
while ( !checkIfNeedToEnd() ) {
$newItems = getItemsFromQueue();
$allRequests = $allRequests + spawnRequests($newItems);
GuzzleHttp::processWhatYouCan($allRequests);
removeProcessedRequests($allRequests);
}
Is this possible?
Alright... figured it out myself:
$handler = new \GuzzleHttp\Handler\CurlMultiHandler();
$client = new \GuzzleHttp\Client(['handler' => $handler]);
$promise1 = $client->getAsync("http://www.stackoverflow.com");
$promise2 = $client->getAsync("http://localhost/");
$doneCount = 0;
$promise1->then(function() use(&$doneCount) {
$doneCount++;
echo 'Promise 1 done!';
});
$promise2->then(function() use(&$doneCount) {
$doneCount++;
echo 'Promise 2 done!';
});
$last = microtime(true);
while ( $doneCount < 2 ) {
$now = microtime(true);
$delta = round(($now-$last)*1000);
echo "tick($delta) ";
$last = $now;
$handler->tick();
}
And the output I get is:
tick(0) tick(6) tick(1) tick(0) tick(1001) tick(10) tick(96) Promise 2 done!tick(97) Promise 1 done!
The magic ingredient is creating the CurlMultiHandler yoursef and then calling tick() on that when it's convenient. After that it's promises as usual. And if the queue is empty, tick() returns immediately.
Note that it can still block for up to 1 second (default) if there is no activity. This can be also changed if needed:
$handler = new \GuzzleHttp\Handler\CurlMultiHandler(['select_timeout' => 0.5]);
The value is in seconds, but with floating point.
I have run into a rather strange issue with a particular part of a large PHP application. The portion of the application in question loads data from MySQL (mostly integer data) and builds a JSON string which gets output to the browser. These requests were taking sevar seconds (8 - 10 seconds each) in Chrome's developer tools as well as via curl. However the PHP shutdown handler I had reported that the requests were executing in less than 1 second.
In order to debug I added a call to fastcgi_finish_request(), and suddenly my shutdown handler reported the same time as Chrome / curl.
With some debugging, I narrowed it down to a particular function. I created the following simple test case:
<?php
$start_time = DFStdLib::exec_time();
$product = new ApparelQuoteProduct(19);
$pmatrix = $product->productMatrix();
// This function call is the problem:
$ranges = $pmatrix->ranges();
$end_time = DFStdLib::exec_time();
$duration = $end_time - $start_time;
echo "Output generation duration was: $duration sec";
fastcgi_finish_request();
$fastcgi_finish_request_duration = DFStdLib::exec_time() - $end_time;
DFSkel::log(DFSkel::LOG_INFO,"Output generation duration was: $duration sec; fastcgi_finish_request Duration was: $fastcgi_finish_request_duration sec");
If I call $pmatrix->ranges() (which is a function that executes a number of calls to mysql_query to fetch data and build an in-memory PHP object structure from that data) then I get the output:
Output generation duration was: 0.2563910484314 sec; fastcgi_finish_request Duration was: 7.3854329586029 sec
in my log file. Note that the call to $pmatrix->ranges() does not take long at all, yet somehow it causes the PHP FastCGI handler to take seven seconds to fihish the request. (This is true even if I don't call fastcgi_finish_request -- the browser takes 7-8 seconds to display the data either way)
If I comment out the call to $pmatrix->ranges() I get:
Output generation duration was: 0.0016419887542725 sec; fastcgi_finish_request Duration was: 0.00035214424133301 sec
I can post the entire source for the $pmatrix->ranges() function, but it's very long. I'd like some advice on where to even start looking.
What is it about the PHP FastCGI request process which would even cause such behavior? Does it call destructor functions / garbage collection? Does it close open resources? How can I troubleshoot this further?
EDIT: Here's a larger source sample:
<?php
class ApparelQuote_ProductPricingMatrix_TestCase
{
protected $myProductId;
protected $myQuantityRanges;
private $myProduct;
protected $myColors;
protected $mySizes;
protected $myQuantityPricing;
public function __construct($product)
{
$this->myProductId = intval($product);
}
/**
* Return an array of all ranges for this matrix.
*
* #return array
*/
public function ranges()
{
$this->myLoadPricing();
return $this->myQuantityRanges;
}
protected function myLoadPricing($force=false)
{
if($force || !$this->myQuantityPricing)
{
$this->myColors = array();
$this->mySizes = array();
$priceRec_finder = new ApparelQuote_ProductPricingRecord();
$priceRec_finder->_link = Module_ApparelQuote::dbLink();
$found_recs = $priceRec_finder->find(_ALL,"`product_id`={$this->myProductId}","`qtyrange_id`,`color_id`");
$qtyFinder = new ApparelQuote_ProductPricingQtyRange();
$qtyFinder->_link = Module_ApparelQuote::dbLink();
$this->myQuantityRanges = $qtyFinder->find(_ALL,"`product_id`=$this->myProductId");
$this->myQuantityPricing = array();
foreach ($found_recs as &$r)
{
if(false) $r = new ApparelQuote_ProductPricingRecord();
if(!isset($this->myColors[$r->color_id]))
$this->myColors[$r->color_id] = true;
if(!isset($this->mySizes[$r->size_id]))
$this->mySizes[$r->size_id] = true;
if(!is_array($this->myQuantityPricing[$r->qtyrange_id]))
$this->myQuantityPricing[$r->qtyrange_id] = array();
if(!is_array($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id] = array();
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id][$r->size_id] = &$r;
}
$this->myColors = array_keys($this->myColors);
$this->mySizes = array_keys($this->mySizes);
}
}
}
$start_time = DFStdLib::exec_time();
$pmatrix = new ApparelQuote_ProductPricingMatrix_TestCase(19);
$ranges = $pmatrix->ranges();
$end_time = DFStdLib::exec_time();
$duration = $end_time - $start_time;
echo "Output generation duration was: $duration sec";
fastcgi_finish_request();
$fastcgi_finish_request_duration = DFStdLib::exec_time() - $end_time;
DFSkel::log(DFSkel::LOG_INFO,"Output generation duration was: $duration sec; fastcgi_finish_request Duration was: $fastcgi_finish_request_duration sec");
Upon continued debugging I have narrowed down to the following lines from the above:
if(!is_array($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
$this->myQuantityPricing[$r->qtyrange_id][$r->color_id] = array();
These statements are building an in-memory array structure of all the data loaded from MySQL. If I comment these out, then fastcgi_finish_request takes roughly 0.0001 seconds to run. If I do not comment them out, then fastcgi_finish_request takes 7+ seconds to run.
It's actually the function call to is_array that's the issue here
Changing to:
if(!isset($this->myQuantityPricing[$r->qtyrange_id][$r->color_id]))
Resolves the problem. Why is this?
I'm still relatively new to PHP and trying to use pthreads to solve an issue. I have 20 threads running processes that end at varying times. Most finish around < 10 seconds or so. I don't need all 20, just 10 detected. Once I get to 10, I would like to kill the threads, or to continue on to the next step.
I have tried using set_time_limit to about 20 seconds for each of the threads, but they ignore it and keep running. I am looping through the jobs looking for the join because I didn't want the rest of the program to run but I'm stuck until the slowest one has finished. While pthreads has reduced the time from around a minute to about 30 seconds, I can shave even more time since the first 10 run in about 3 seconds.
Thanks for any help and here is my code:
$count = 0;
foreach ( $array as $i ) {
$imgName = $this->smsId."_$count.jpg";
$name = "LocalCDN/".$imgName;
$stack[] = new AsyncImageModify($i['largePic'], $name);
$count++;
}
// Run the threads
foreach ( $stack as $t ) {
$t->start();
}
// Check if the threads have finished; push the coordinates into an array
foreach ( $stack as $t ) {
if($t->join()){
array_push($this->imgArray, $t->data);
}
}
class class AsyncImageModify extends \Thread{
public $data;
public function __construct($arg, $name, $container) {
$this->arg = $arg;
$this->name = $name;
}
public function run() {
//tried putting the set_time_limit() here, didn't work
if ($this->arg) {
// Get the image
$didWeGetTheImage = Image::getImage($this->arg, $this->name);
if($didWeGetTheImage){
$timestamp1 = microtime(true);
print_r("Starting face detection $this->arg" . "\n");
print_r(" ");
$j = Image::process1($this->name);
if($j){
// lets go ahead and do our image manipulation at this point
$userPic = Image::process2($this->name, $this->name, 200, 200, false, $this->name, $j);
if($userPic){
$this->data = $userPic;
print_r("Back from process2; the image returned is $userPic");
}
}
$endTime = microtime(true);
$td = $endTime-$timestamp1;
print_r("Finished face detection $this->arg in $td seconds" . "\n");
print_r($j);
}
}
}
It is difficult to guess the functionality of Image::* methods, so I can't really answer in any detail.
What I can say, is that there are very few machines I can think of that are suitable to run 20 concurrent threads in any case. A more suitable setup would be the worker/stackable model. A Worker thread is a reuseable context, and can execute task after task, implemented as Stackables; execution in a multi-threaded environment should always use the least amount of threads to get the most work done possible.
Please see pooling example and other examples that are distributed with pthreads, available on github, additionally, much information regarding usage is contained in past bug reports, if you are still struggling after that ...