How do I observe a rate Limit in PHP?

How do I observe a rate Limit in PHP? - php

So, I'm requesting data from an API.
So far, my API key is limited to:
10 requests every 10 seconds
500 requests every 10 minutes
Bascially, I want to request a specific value from every game the user has played.
That are, for example, about 300 games.
So I have to make 300 requests with my PHP. How can I slow them down to observe the rate limit?
(It can take time, site does not have to be fast)
I tried sleep(), which resulted in my script crashing.. Any other ways to do this?

I suggest setting up a cron job that executes every minute, or even better use Laravel scheduling rather than using sleep or usleep to imitate a cron.
Here is some information on both:
https://laravel.com/docs/5.1/scheduling
http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/

This sounds like a perfect use for the set_time_limit() function. This function allows you to specify how long your script can execute, in seconds. For example, if you say set_time_limit(45); at the beginning of your script, then the script will run for a total of 45 seconds. One great feature of this function is that you can allow your script to execute indefinitely (no time limit) by saying: set_time_limit(0);.
You may want to write your script using the following general structure:
<?php
// Ignore user aborts and allow the script
// to run forever
ignore_user_abort(true);
set_time_limit(0);
// Define constant for how much time must pass between batches of connections:
define('TIME_LIMIT', 10); // Seconds between batches of API requests
$tLast = 0;
while( /* Some condition to check if there are still API connections that need to be made */ ){
if( timestamp() <= ($tLast + TIME_LIMIT) ){ // Check if TIME_LIMIT seconds have passed since the last connection batch
// TIME_LIMIT seconds have passed since the last batch of connections
/* Use cURL multi to make 10 asynchronous connections to the API */
// Once all of those connections are made and processed, save the current time:
$tLast = timestamp();
}else{
// TIME_LIMIT seconds have not yet passed
// Calculate the total number of seconds remaining until TIME_LIMIT seconds have passed:
$timeDifference = $tLast + TIME_LIMIT - timestamp();
sleep( $timeDifference ); // Sleep for the calculated number of seconds
}
} // END WHILE-LOOP
/* Do any additional processing, computing, and output */
?>
Note: In this code snippet, I am also using the ignore_user_abort() function. As noted in the comment on the code, this function just allows the script to ignore a user abort, so if the user closes the browser (or connection) while your script is still executing, the script will continue retrieving and processing the data from the API anyway. You may want to disable that in your implementation, but I will leave that up to you.
Obviously this code is very incomplete, but it should give you a decent understanding of how you could possibly implement a solution for this problem.

Don't slow the individual requests down.
Instead, you'd typically use something like Redis to keep track of requests per-IP or per-user. Once the limit is hit for a time period, reject (with a HTTP 429 status code, perhaps) until the count resets.
http://redis.io/commands/INCR coupled with http://redis.io/commands/expire would easily do the trick.

Related

How to limit external API calls from PHP app

I have a PHP app that is overloading an external API with too many calls per minute. I need to limit it to only 20 calls a minute, but I can't seem to figure it out. I have researched the issue and found this and that, but there is a lack of proper documentation and I don't know how it would work. I understand that this is called "rate limiting", but I guess I skipped that in school.
My app is just sending cURL requests in a loop. I have several loops running in the program to gather all this information together. I could just limit the one loop with a timer for 20 per minute, but I have 17 loops running and I have loops within loops. Is it possible to just limit all the cURL requests within my PHP app with a single helper or something and not edit all my code?

There is no way to rate limit PHP functions using any built-in features. You could write some simple wrapper which would call the API only a given amount of times per minute. A crude example would look like this:
function callAPI($api) {
static $lastRequest;
$maxRequestsPerMin = 20;
if (isset($lastRequest)) {
$delay = 60 / $maxRequestsPerMin; // 60 seconds / $maxRequestsPerMin
if ((microtime(true) - $lastRequest) < $delay) {
// Sleep until the delay is reached
$sleepAmount = ($delay - microtime(true) + $lastRequest) * (1000 ** 2);
usleep($sleepAmount);
}
}
$lastRequest = microtime(true);
// Call you API here
}
However, this will only rate-limit this particular script. If you execute another one, then you will start another counter. Alternatively you could store some round robin table either in a flat file or in a database, and check against it every time you want to call the API.
For advanced usages you should look into message queues or ReactPHP. You do not want to hang your server if such functionality would be exposed to the end-users.

Get response from multiple jobs in Gearman, but abort after a timeout

In a nutshell: I want to have an overall timeout for a call to runTasks() in a Gearman client.
I feel like I can't be the first person to want this, but I can't find an example of how to put it together.
Here's what I want to achieve:
In a PHP script, use the Gearman client to start a series of jobs in parallel
Each job will produce some search results, which the PHP script will need to process
Some of the jobs may take some time to run, but I don't want to wait for the slowest. Instead, after N milliseconds, I want to process the results from all of the jobs that have completed, and abort or ignore those that haven't.
Requirements 1 and 2 are simple enough using the PHP GearmanClient's addTask() and runTasks() methods, but this blocks until all the submitted jobs are completed, so doesn't meet requirement 3.
Here are some approaches I've tried so far:
The timeout set with setTimeout() measures the time the connection has been idle, which isn't what I'm interested in.
Using background jobs or tasks, there doesn't seem to be any way of retrieving the data returned by the worker. There are several questions already covering this: 1 2
The custom polling loop in the example for addTaskStatus() is almost what I need, but it uses background jobs, so again can't see any results. It also includes the vague comment "a better method would be to use event callbacks", without explaining what callbacks it means, or which part of the example they'd replace.
The client options include a GEARMAN_CLIENT_NON_BLOCKING mode, but I don't understand how to use this, and if a non-blocking runTasks() is any different from using setTaskBackground() instead of setTask().
I've seen suggestions that return communication could just use a different mechanism, like a shared data store, but in that case, I might as well ditch Gearman and build a custom solution with RabbitMQ.

I think I've found a workable solution, although I'd still be interested in alternatives.
The key is that calling runTasks() again after an I/O timeout continues to wait for the previous synchronous tasks, so you can build a polling loop out of these parts:
Synchronous, parallel, tasks set up with addTask().
A completion callback set with setCompleteCallback() which tracks which tasks have finished and how many are still pending.
A low I/O timeout set with setTimeout() which acts as your polling frequency.
Repeated calls to runTasks() in a loop, exiting when either all tasks are done, or an overall timeout is reached. This could also have more complex exit conditions, like "after N seconds, or at least X results", etc.
The big downside is that the timeouts issue a PHP Warning, so you have to squash that with a custom error handler or the # operator.
Here's a fully tested example:
// How long should we wait each time around the polling loop if nothing happens
define('LOOP_WAIT_MS', 100);
// How long in total should we wait for responses before giving up
define('TOTAL_TIMEOUT_MS', 5000);
$client= new GearmanClient();
$client->addServer();
// This will fire as each job completes.
// In real code, this would save the data for later processing,
// as well as tracking which jobs were completed, tracked here with a simple counter.
$client->setCompleteCallback(function(GearmanTask $task) use (&$pending) {
$pending--;
echo "Complete!\n";
echo $task->data();
});
// This array can be used to track the tasks created. This example just counts them.
$tasks = [];
// Sample tasks; the workers sleep for specified number of seconds before returning some data.
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$pending = count($tasks);
// This is the key polling loop; runTasks() here acts as "wait for a notification from the server"
$client->setTimeout(LOOP_WAIT_MS);
$start = microtime(true);
do {
// This will abort with a PHP Warning if no data is received in LOOP_WAIT_MS milliseconds
// We ignore the warning, and try again, unless we've reached our overall time limit
#$client->runTasks();
} while (
// Exit the loop once we run out of time
microtime(true) - $start < TOTAL_TIMEOUT_MS / 1000
// Additional loop exit if all tasks have been completed
// This counter is decremented in the complete callback
&& $pending > 0
);
echo "Finished with $pending tasks unprocessed.\n";

Your use case sounds like what CAN_DO_TIMEOUT was created for:
CAN_DO_TIMEOUT
Same as CAN_DO, but with a timeout value on how long the job
is allowed to run. After the timeout value, the job server will
mark the job as failed and notify any listening clients.
Arguments:
- NULL byte terminated Function name.
- Timeout value.
So for any (Worker,Function) tuple you can define a maximum time the Worker will process a Job, otherwise it'll be discarded.
Unfortunately there appears to be a bug in the C Server where the timeout is hard-coded at 1000 seconds.
One workaround is if you're able to implement your timeout logic outside of gearman. For example, if you're using curl, soap, sockets, etc., one can often achieve the desired effect by tweaking those settings.

Only one call from concurrent request with 60 sec timeout

I have a function callUpdate() that needs to be executed after every update in the webpage admin.
callUpdates execute some caching (and takes up to 30 sec..) so it is not important to execute it immediately but in reasonable amount of time after the last update lets say 60 sec.
The goal is to skip processing if the user (users) make several consecutive changes in a short amount of time.
here is my current code:
//this in separate stand alone script that is called asynchronous way
//so hanging for 1min does not and block the app.
function afterUpdate(){
$time = time();
file_put_contents('timer.txt', $time);
sleep(60);
if (file_get_contents("timer.txt") == $time) {
callUpdate();
}
}
My concerns here are bout the sleep function .. if it takes too much resources
(if I make 10 quick saves, this will start 10 PHP processes running for almost 60 sec each ..)
And what will happen if 2 users call simultaneously file_put_contents() on the same file.
Please tell me if there is better approach and if there are some major issues in mine.
NOTE: data between sessions can be stored only in a file
and there I have limited access to the server setup "APC settings and such"

Throttling PHPmailer for use in Elgg

I'll be using the social networking software, Elgg, for an organization that needs to send mass emails to specific groups when they need to. The number of emails can range from 10-1000 depending on the group. Web host only allows 500 emails per hour, so I need to throttle the script to send one email every 8 seconds.
I'm using PHPmailer with Elgg. PHPmailer says that I should use these two scripts (code below) in conjunction with each other in order to throttle the mailing. I know how I'm going to use the code in the mailing script, I'm just unsure about a couple things.
1) I don't really understand the purpose for the safemode
2) After looking up set_time_limit, it looks like I should set this to an amount of time to allow all potential emails to be sent, whether it's 10 or 1000? Or is this a max of 30 seconds per loop in case it needs to timeout?
3) How should I set this to get what I need?
Links to PHPmailer describing code:
http://phpmailer.worxware.com/index.php?pg=tip_ext
http://phpmailer.worxware.com/index.php?pg=tip_pause
<?php
/* The following code snippet with set the maximum execution time
* of your script to 300 seconds (5 minutes)
* Note: set_time_limit() does not work with safe_mode enabled
*/
$safeMode = ( #ini_get("safe_mode") == 'On' || #ini_get("safe_mode") === 1 ) ? TRUE : FALSE;
if ( $safeMode === FALSE ) {
set_time_limit(300); // Sets maximum execution time to 5 minutes (300 seconds)
// ini_set("max_execution_time", "300"); // this does the same as "set_time_limit(300)"
}
echo "max_execution_time " . ini_get('max_execution_time') . "<br>";
/* if you are using a loop to execute your mailing list (example: from a database),
* put the command in the loop
*/
while (1==1) {
set_time_limit(30); // sets (or resets) maximum execution time to 30 seconds)
// .... put code to process in here
if (1!=1) {
break;
}
}
?>
and
<?php
/* Note: set_time_limit() does not work with safe_mode enabled */
while (1==1) {
set_time_limit(30); // sets (or resets) maximum execution time to 30 seconds)
// .... put code to process in here
usleep(1000000); // sleep for 1 million micro seconds - will not work with Windows servers / PHP4
// sleep(1); // sleep for 1 seconds (use with Windows servers / PHP4
if (1!=1) {
break;
}
}
?>

Safe mode is deprecated as of php 5.3 and removed in php 5.4, so if your install is relatively recent, it's a moot point: http://php.net/manual/en/ini.sect.safe-mode.php#ini.safe-mode
Doing a set_time_limit() will reset the counter, so as long as your code reaches the set_time_limit() call in less time than the limit was set previously (e.g. gets there in 29 seconds, leaving 1 second on clock), the code will reset the timer and get another 30 seconds. However, since you don't want your code to be racy, you should simply disable the time limit entirely.
Personally, I wouldn't dump out one email every 8 seconds. I'd blast out the 500 we're allowed, then have a scheduled job to fire up the script once an hour and resume from where the blast left off. This will make things be a bit bursty for the mail server, but potentially more efficient in the long run, as it could batch together emails for the same recipient domains. e.g. all #aol.com mails in the group of 500 can go together, rather than forcing the server to connect to aol multiple times to deliver individual mails.
As well, if you're batching like this, a server failure will only be 'bad' during the few seconds when the script's actually running and building emails. The rest of the time the PHP script won't even be running, and it'll be up to the smtp server to do its thing.

I might not be of quick and perticular help but i would consider an asynchronous approach.
This involves storing the task to send an email in a queue and having workers which process those tasks.
The simplest way is to just store emails in the database and have a cronjob running on the server which sends the emails in batches.
The better (but more complex) solution would be to use some sort of message queue system, like zeromq or the heavy-weight rabbitmq.
The last and maybe most comfortable option from the top of my head is to use a web service like MailChimp or Postmark.

php - how to determine execution time?

I have to process more images with big amount, (10 mb/image)
How do I determine the execution time to process all of the images in queue?
Determine the time base from the data that we have.
Set the time limit.
Run process.
And what do the execution time depend on? (Server, internet speed, type of data...?)
#All:
i have changed my way to do my issue, send 1reqeust/1image,
so with 40 images, we will be have 40 request. no need to care about excution time :D
Thanks

You can test your setup with the code
// first image
$start = time();
// ... work on image ...
$elapsed = time() - $start;
if (ini_get('max_execution_time') > $elapsed * $imageCount)
trigger_warning("may be not able to finish in time", E_USER_WARNING);
Please note two things: in the CLI-version of PHP, the max_execution_time is hardcoded to 0 / inifinity (according to this comment). Also, you may reset the timer by calling set_time_limit() again like so:
foreach ($imagelist as $image) {
// ... do image work ....
// reset timer to 30
set_time_limit(30);
}
That way, you can let your script run forever or at least until you're finished with your image processing. You must enable the appropriate overwrite rules in the apache-configuration to allow this via AllowOverride All

I would suggest (given your limited info in the question) that you try using the trial and error method - run your process and see how long it takes - increase the time limit until it completes - you might be able to shorten your process.

Be aware that the server processing time can vary a LOT depending on the current load on the server from other processess. If it's a shared server, some other user can be running some script at this exact time, making your script only perform half as well.
I think it's going to be hard to determine the execution time BEFORE the script is run.
I would upload batches (small groups) of images. The number of images would depend on some testing.
For example, run your script several times simultaneously from different pages to see if they all still complete without breaking. If it works with 5 images in the queue, write your script to process 5 images. After the first five images has processed, store them (write to database or whatever you need), wait a little bit then take the next 5 images.
If it works when you run three scripts with 5 images each at the same time, you should be safe doing it once with whatever some other user on the server is doing.
You change the time execution time limit in the file php.ini, or if you don't have access to the file you can set it in on the fly with set_time_limit(600) for 600 seconds. I would however write smarter code instead than relying on time limit.
My five cents. Good luck!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.