I have a PHP app that is overloading an external API with too many calls per minute. I need to limit it to only 20 calls a minute, but I can't seem to figure it out. I have researched the issue and found this and that, but there is a lack of proper documentation and I don't know how it would work. I understand that this is called "rate limiting", but I guess I skipped that in school.
My app is just sending cURL requests in a loop. I have several loops running in the program to gather all this information together. I could just limit the one loop with a timer for 20 per minute, but I have 17 loops running and I have loops within loops. Is it possible to just limit all the cURL requests within my PHP app with a single helper or something and not edit all my code?
There is no way to rate limit PHP functions using any built-in features. You could write some simple wrapper which would call the API only a given amount of times per minute. A crude example would look like this:
function callAPI($api) {
static $lastRequest;
$maxRequestsPerMin = 20;
if (isset($lastRequest)) {
$delay = 60 / $maxRequestsPerMin; // 60 seconds / $maxRequestsPerMin
if ((microtime(true) - $lastRequest) < $delay) {
// Sleep until the delay is reached
$sleepAmount = ($delay - microtime(true) + $lastRequest) * (1000 ** 2);
usleep($sleepAmount);
}
}
$lastRequest = microtime(true);
// Call you API here
}
However, this will only rate-limit this particular script. If you execute another one, then you will start another counter. Alternatively you could store some round robin table either in a flat file or in a database, and check against it every time you want to call the API.
For advanced usages you should look into message queues or ReactPHP. You do not want to hang your server if such functionality would be exposed to the end-users.
Related
So, I'm requesting data from an API.
So far, my API key is limited to:
10 requests every 10 seconds
500 requests every 10 minutes
Bascially, I want to request a specific value from every game the user has played.
That are, for example, about 300 games.
So I have to make 300 requests with my PHP. How can I slow them down to observe the rate limit?
(It can take time, site does not have to be fast)
I tried sleep(), which resulted in my script crashing.. Any other ways to do this?
I suggest setting up a cron job that executes every minute, or even better use Laravel scheduling rather than using sleep or usleep to imitate a cron.
Here is some information on both:
https://laravel.com/docs/5.1/scheduling
http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/
This sounds like a perfect use for the set_time_limit() function. This function allows you to specify how long your script can execute, in seconds. For example, if you say set_time_limit(45); at the beginning of your script, then the script will run for a total of 45 seconds. One great feature of this function is that you can allow your script to execute indefinitely (no time limit) by saying: set_time_limit(0);.
You may want to write your script using the following general structure:
<?php
// Ignore user aborts and allow the script
// to run forever
ignore_user_abort(true);
set_time_limit(0);
// Define constant for how much time must pass between batches of connections:
define('TIME_LIMIT', 10); // Seconds between batches of API requests
$tLast = 0;
while( /* Some condition to check if there are still API connections that need to be made */ ){
if( timestamp() <= ($tLast + TIME_LIMIT) ){ // Check if TIME_LIMIT seconds have passed since the last connection batch
// TIME_LIMIT seconds have passed since the last batch of connections
/* Use cURL multi to make 10 asynchronous connections to the API */
// Once all of those connections are made and processed, save the current time:
$tLast = timestamp();
}else{
// TIME_LIMIT seconds have not yet passed
// Calculate the total number of seconds remaining until TIME_LIMIT seconds have passed:
$timeDifference = $tLast + TIME_LIMIT - timestamp();
sleep( $timeDifference ); // Sleep for the calculated number of seconds
}
} // END WHILE-LOOP
/* Do any additional processing, computing, and output */
?>
Note: In this code snippet, I am also using the ignore_user_abort() function. As noted in the comment on the code, this function just allows the script to ignore a user abort, so if the user closes the browser (or connection) while your script is still executing, the script will continue retrieving and processing the data from the API anyway. You may want to disable that in your implementation, but I will leave that up to you.
Obviously this code is very incomplete, but it should give you a decent understanding of how you could possibly implement a solution for this problem.
Don't slow the individual requests down.
Instead, you'd typically use something like Redis to keep track of requests per-IP or per-user. Once the limit is hit for a time period, reject (with a HTTP 429 status code, perhaps) until the count resets.
http://redis.io/commands/INCR coupled with http://redis.io/commands/expire would easily do the trick.
I have a function callUpdate() that needs to be executed after every update in the webpage admin.
callUpdates execute some caching (and takes up to 30 sec..) so it is not important to execute it immediately but in reasonable amount of time after the last update lets say 60 sec.
The goal is to skip processing if the user (users) make several consecutive changes in a short amount of time.
here is my current code:
//this in separate stand alone script that is called asynchronous way
//so hanging for 1min does not and block the app.
function afterUpdate(){
$time = time();
file_put_contents('timer.txt', $time);
sleep(60);
if (file_get_contents("timer.txt") == $time) {
callUpdate();
}
}
My concerns here are bout the sleep function .. if it takes too much resources
(if I make 10 quick saves, this will start 10 PHP processes running for almost 60 sec each ..)
And what will happen if 2 users call simultaneously file_put_contents() on the same file.
Please tell me if there is better approach and if there are some major issues in mine.
NOTE: data between sessions can be stored only in a file
and there I have limited access to the server setup "APC settings and such"
I can't get into too many specifics as this is a project for work, but anyways..
I'm in the process of writing a SOAP client in PHP that pushes all responses to a MySQL database. My main script makes an initial soap request that retrieves a large set of items (approximately ~4000 at the moment, but the list is expected to grow into hundreds of thousands at some point).
Once this list of 4000 items is returned, I use exec("/usr/bin/php path/to/my/historyScript.php &") that sends a history request for each item. The web service api supports up to 30 requests / sec. Below is some pseudo code for what I am currently doing:
$count = 0;
foreach( $items as $item )
{
if ( $count == 30 )
{
sleep(1); // Sleep for one second before calling the next 30 requests
$count = 0;
}
exec('/usr/bin/php path/to/history/script.php &');
$count++;
}
The problem I'm running into is that I am unsure when the processes finish and my development server is starting to crash. Since data is expected to grow, I know this is a very poor solution to my problem.
Might there be a better approach I should consider using for a task like this? I just feel that this is more of a 'hack'
I am not sure, but i feel that the reason for your application crash, you are keeping large set of data in PHP variable. Look into this, based on RAM size this(data size) will leads to system crash. And my suggestion is try to limit incoming data from external service per request, instead number of request to the service.
I'm building web app that uses Blekko API( web search API ).
Application is multi-user.
I need to limit calls to API to 1[call/second].
This limit should apply to all activities by all users i.e. there should be some schedule for using API.
I need some sugesstions how to do that?
It sounds like responsiveness to the API calls isn't too important since you are talking about queueing. If that's the case, I would dump the API request URL into a database table. Then with a background worker process, I would do something to this effect:
set_time_limit(0);
$api_requests = array();
while (TRUE)
{
if (count($api_requests) == 0)
{
// get multiple records from DB to limit requests and add
// to the $api_requests array.
// if DB returns no results, maybe sleep a few extra seconds
// to avoid "slamming" the database.
}
// get the next API request from the array
$request = array_shift($api_requests);
// send API request to Blekko
// process API results
// sleep 1 sec
sleep(1);
}
This is a bit of a "busy" loop, but it will ensure that you never run more than one request per second and also guarantees that a queued request won't wait too long to be processed.
Note: This method does require that your server won't kill the process itself, regardless of the set_time_limit() call. Long running processes are oftentimes killed on shared servers.
A simple way to do this is to use usleep()
usleep(1000000); will pause the script for 1.0 seconds
Reading it at http://blog.programmableweb.com/2007/04/02/12-ways-to-limit-an-api/ I wondered how to accomplish time based limits (ie. 1 call per second). Well, if authentication required, the way is to compare the seconds tracked by PHP's time function (for instance, if ($previous_call_time == $current_call_time) { ... })? Any other suggestions?
Use a simple cache:
if(filemtime("cache.txt") < (int)$_SERVER["REQUEST_TIME"] - 3600) { // 3600 is one hour in seconds
$data = file_get_contents("http://remote.api/url/goes/here");
file_put_contents("cache.txt", $data);
} else
$data = file_get_contents("cache.txt");
Something like this will save the value of the API and let you get the information whenever you want while still limiting how often you actually pull date from the feed.
Hope this helps!
The more general limitations criteria is "number of calls per 'period'". For example - per hour, like twitter does.
You can track number of currently performed requests by adding a record to mysql for each request.
The more performant solution in this way is to use memcached and to increment the "key" (user_id + current_hour).
Something to consider: You can use an external service to do this instead of building it yourself. E.g. my company, WebServius ( http://www.webservius.com ) currently supports configurable per-API and per-API-key throttling, and we are likely going to be adding even more features such as "adaptive throttling" (automatically throttle usage when API becomes less responsive).