I'm building web app that uses Blekko API( web search API ).
Application is multi-user.
I need to limit calls to API to 1[call/second].
This limit should apply to all activities by all users i.e. there should be some schedule for using API.
I need some sugesstions how to do that?
It sounds like responsiveness to the API calls isn't too important since you are talking about queueing. If that's the case, I would dump the API request URL into a database table. Then with a background worker process, I would do something to this effect:
set_time_limit(0);
$api_requests = array();
while (TRUE)
{
if (count($api_requests) == 0)
{
// get multiple records from DB to limit requests and add
// to the $api_requests array.
// if DB returns no results, maybe sleep a few extra seconds
// to avoid "slamming" the database.
}
// get the next API request from the array
$request = array_shift($api_requests);
// send API request to Blekko
// process API results
// sleep 1 sec
sleep(1);
}
This is a bit of a "busy" loop, but it will ensure that you never run more than one request per second and also guarantees that a queued request won't wait too long to be processed.
Note: This method does require that your server won't kill the process itself, regardless of the set_time_limit() call. Long running processes are oftentimes killed on shared servers.
A simple way to do this is to use usleep()
usleep(1000000); will pause the script for 1.0 seconds
Related
I have a PHP app that is overloading an external API with too many calls per minute. I need to limit it to only 20 calls a minute, but I can't seem to figure it out. I have researched the issue and found this and that, but there is a lack of proper documentation and I don't know how it would work. I understand that this is called "rate limiting", but I guess I skipped that in school.
My app is just sending cURL requests in a loop. I have several loops running in the program to gather all this information together. I could just limit the one loop with a timer for 20 per minute, but I have 17 loops running and I have loops within loops. Is it possible to just limit all the cURL requests within my PHP app with a single helper or something and not edit all my code?
There is no way to rate limit PHP functions using any built-in features. You could write some simple wrapper which would call the API only a given amount of times per minute. A crude example would look like this:
function callAPI($api) {
static $lastRequest;
$maxRequestsPerMin = 20;
if (isset($lastRequest)) {
$delay = 60 / $maxRequestsPerMin; // 60 seconds / $maxRequestsPerMin
if ((microtime(true) - $lastRequest) < $delay) {
// Sleep until the delay is reached
$sleepAmount = ($delay - microtime(true) + $lastRequest) * (1000 ** 2);
usleep($sleepAmount);
}
}
$lastRequest = microtime(true);
// Call you API here
}
However, this will only rate-limit this particular script. If you execute another one, then you will start another counter. Alternatively you could store some round robin table either in a flat file or in a database, and check against it every time you want to call the API.
For advanced usages you should look into message queues or ReactPHP. You do not want to hang your server if such functionality would be exposed to the end-users.
My website has a script that will call an external API when a user visits a particular page.
An API request will be made when the page is accessed and a response is returned in xml format.
I am using the usual curl requests.
Right now, due to new implementations on the API side, if the API is getting too much requests, it will throw an exception and deny the request.
I want to limit the total calls to the API from my website to only 8 times per second.
How can I achieve this? Someone suggested me about queuing the requests but I've never done something like this before and I'm having a hard time finding a solution.
Sorry if my English has errors. Any help is appreciated.
For example: if 100 users accessed the web-page all at the same time, I need to queue those API requests 8 after 8 per second and so on until all are done.
I have give suggest you to use one api generate to create token and match token on every request and do expiry or delete token after some time. So may be resolve your multiple request issues.
$currentCount=0;
$currentSeconds;
function callAPI()
{
if($currentCount<8 || date("s") != $currentSeconds)
{
if(date("s") != $currentSeconds)
{
$currentCount=0;
}
$currentSeconds=date("s");
//call your API here
$currentCount++;
}
}
For each API call:
-Record the current time (just the seconds) in a variable.
-Make your API call.
-Increment a call counter.
-Check again if current seconds equal the previously stored value and if your call counter is under 8. If your call counter is under 8, you may make another call.
You can delay the API request for microseconds, here is sample code
usleep(1250000);//1 sec = 10,000,00 ms
function _callAPI(){
// Your code here
}
When a user visits your site, the request will fire after a few microseconds in this way you can delay the request.
You can also maintain a log when a request is fired for the API and based on the previous request, dealy the next request.
Value of 8 call per second is low, so can save in database each call attempt and calculate number of calls per last 5 second every time.
For large values usually used counters in nosql database like Cassandra or Aerospike.
I.e for each request you get current time and increase counter name "counter"+second until you got your desired limit.
Aerospike is best for this if load is really high(1000+ cps), it give very low latency.
Cassandra is simpler to use and require less memory.
Even less memory is memcashed.
In a nutshell: I want to have an overall timeout for a call to runTasks() in a Gearman client.
I feel like I can't be the first person to want this, but I can't find an example of how to put it together.
Here's what I want to achieve:
In a PHP script, use the Gearman client to start a series of jobs in parallel
Each job will produce some search results, which the PHP script will need to process
Some of the jobs may take some time to run, but I don't want to wait for the slowest. Instead, after N milliseconds, I want to process the results from all of the jobs that have completed, and abort or ignore those that haven't.
Requirements 1 and 2 are simple enough using the PHP GearmanClient's addTask() and runTasks() methods, but this blocks until all the submitted jobs are completed, so doesn't meet requirement 3.
Here are some approaches I've tried so far:
The timeout set with setTimeout() measures the time the connection has been idle, which isn't what I'm interested in.
Using background jobs or tasks, there doesn't seem to be any way of retrieving the data returned by the worker. There are several questions already covering this: 1 2
The custom polling loop in the example for addTaskStatus() is almost what I need, but it uses background jobs, so again can't see any results. It also includes the vague comment "a better method would be to use event callbacks", without explaining what callbacks it means, or which part of the example they'd replace.
The client options include a GEARMAN_CLIENT_NON_BLOCKING mode, but I don't understand how to use this, and if a non-blocking runTasks() is any different from using setTaskBackground() instead of setTask().
I've seen suggestions that return communication could just use a different mechanism, like a shared data store, but in that case, I might as well ditch Gearman and build a custom solution with RabbitMQ.
I think I've found a workable solution, although I'd still be interested in alternatives.
The key is that calling runTasks() again after an I/O timeout continues to wait for the previous synchronous tasks, so you can build a polling loop out of these parts:
Synchronous, parallel, tasks set up with addTask().
A completion callback set with setCompleteCallback() which tracks which tasks have finished and how many are still pending.
A low I/O timeout set with setTimeout() which acts as your polling frequency.
Repeated calls to runTasks() in a loop, exiting when either all tasks are done, or an overall timeout is reached. This could also have more complex exit conditions, like "after N seconds, or at least X results", etc.
The big downside is that the timeouts issue a PHP Warning, so you have to squash that with a custom error handler or the # operator.
Here's a fully tested example:
// How long should we wait each time around the polling loop if nothing happens
define('LOOP_WAIT_MS', 100);
// How long in total should we wait for responses before giving up
define('TOTAL_TIMEOUT_MS', 5000);
$client= new GearmanClient();
$client->addServer();
// This will fire as each job completes.
// In real code, this would save the data for later processing,
// as well as tracking which jobs were completed, tracked here with a simple counter.
$client->setCompleteCallback(function(GearmanTask $task) use (&$pending) {
$pending--;
echo "Complete!\n";
echo $task->data();
});
// This array can be used to track the tasks created. This example just counts them.
$tasks = [];
// Sample tasks; the workers sleep for specified number of seconds before returning some data.
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$tasks[] = $client->addTask('wait', '2');
$pending = count($tasks);
// This is the key polling loop; runTasks() here acts as "wait for a notification from the server"
$client->setTimeout(LOOP_WAIT_MS);
$start = microtime(true);
do {
// This will abort with a PHP Warning if no data is received in LOOP_WAIT_MS milliseconds
// We ignore the warning, and try again, unless we've reached our overall time limit
#$client->runTasks();
} while (
// Exit the loop once we run out of time
microtime(true) - $start < TOTAL_TIMEOUT_MS / 1000
// Additional loop exit if all tasks have been completed
// This counter is decremented in the complete callback
&& $pending > 0
);
echo "Finished with $pending tasks unprocessed.\n";
Your use case sounds like what CAN_DO_TIMEOUT was created for:
CAN_DO_TIMEOUT
Same as CAN_DO, but with a timeout value on how long the job
is allowed to run. After the timeout value, the job server will
mark the job as failed and notify any listening clients.
Arguments:
- NULL byte terminated Function name.
- Timeout value.
So for any (Worker,Function) tuple you can define a maximum time the Worker will process a Job, otherwise it'll be discarded.
Unfortunately there appears to be a bug in the C Server where the timeout is hard-coded at 1000 seconds.
One workaround is if you're able to implement your timeout logic outside of gearman. For example, if you're using curl, soap, sockets, etc., one can often achieve the desired effect by tweaking those settings.
So, I'm requesting data from an API.
So far, my API key is limited to:
10 requests every 10 seconds
500 requests every 10 minutes
Bascially, I want to request a specific value from every game the user has played.
That are, for example, about 300 games.
So I have to make 300 requests with my PHP. How can I slow them down to observe the rate limit?
(It can take time, site does not have to be fast)
I tried sleep(), which resulted in my script crashing.. Any other ways to do this?
I suggest setting up a cron job that executes every minute, or even better use Laravel scheduling rather than using sleep or usleep to imitate a cron.
Here is some information on both:
https://laravel.com/docs/5.1/scheduling
http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/
This sounds like a perfect use for the set_time_limit() function. This function allows you to specify how long your script can execute, in seconds. For example, if you say set_time_limit(45); at the beginning of your script, then the script will run for a total of 45 seconds. One great feature of this function is that you can allow your script to execute indefinitely (no time limit) by saying: set_time_limit(0);.
You may want to write your script using the following general structure:
<?php
// Ignore user aborts and allow the script
// to run forever
ignore_user_abort(true);
set_time_limit(0);
// Define constant for how much time must pass between batches of connections:
define('TIME_LIMIT', 10); // Seconds between batches of API requests
$tLast = 0;
while( /* Some condition to check if there are still API connections that need to be made */ ){
if( timestamp() <= ($tLast + TIME_LIMIT) ){ // Check if TIME_LIMIT seconds have passed since the last connection batch
// TIME_LIMIT seconds have passed since the last batch of connections
/* Use cURL multi to make 10 asynchronous connections to the API */
// Once all of those connections are made and processed, save the current time:
$tLast = timestamp();
}else{
// TIME_LIMIT seconds have not yet passed
// Calculate the total number of seconds remaining until TIME_LIMIT seconds have passed:
$timeDifference = $tLast + TIME_LIMIT - timestamp();
sleep( $timeDifference ); // Sleep for the calculated number of seconds
}
} // END WHILE-LOOP
/* Do any additional processing, computing, and output */
?>
Note: In this code snippet, I am also using the ignore_user_abort() function. As noted in the comment on the code, this function just allows the script to ignore a user abort, so if the user closes the browser (or connection) while your script is still executing, the script will continue retrieving and processing the data from the API anyway. You may want to disable that in your implementation, but I will leave that up to you.
Obviously this code is very incomplete, but it should give you a decent understanding of how you could possibly implement a solution for this problem.
Don't slow the individual requests down.
Instead, you'd typically use something like Redis to keep track of requests per-IP or per-user. Once the limit is hit for a time period, reject (with a HTTP 429 status code, perhaps) until the count resets.
http://redis.io/commands/INCR coupled with http://redis.io/commands/expire would easily do the trick.
I'm building a small PHP/Javascript app which will do some processing for all cities in all US states. This rounds up to a total of (52 x 25583) = 1330316 or less items that will need to be processed.
The processing of each item will take about 2-3 seconds, so its possible that the user could have to stare at this page for 1-2 hours (or at least keep it minimized while he did other stuff).
In order to give the user the maximum feedback, I was thinking of controlling the processing of the page via javascript, basically something like this:
var current = 1;
var max = userItems.length; // 1330316 or less
process();
function process()
{
if (current >= max)
{
alert('done');
return;
}
$.post("http://example.com/process", {id: current}, function()
{
$("#current").html(current);
current ++;
process();
}
);
}
In the html i will have the following status message which will be updated whenever the process() function is called:
<div id="progress">
Please wait while items are processed.
<span id="current">0</span> / <span id="max">1330316</span> items have been processed.
</div>
Hopefully you can all see how I want this to work.
My only concern is that, if those 1330316 requests are made simultaneously to the server, is there a possibility that this crashes/brings down the server? If so, if I put in an extra wait of 2 seconds per request using sleep(3); in the server-side PHP code, will that make things better?
Or is there a different mechanism for showing the user the rapid feedback such as polling which doesn't require me to mess with apache or the server?
If you can place a cronjob in the server, I believe it'd work much better. What about using a cronjob to do the actual processing and use Javascript to update periodically the status (say, every 10 seconds)?
Then, the first step would be to trigger some flag that the cronjob PHP will check. If it's active, then the task must be performed (you could use some temporary file to tell the script which records must be processsed).
The cronjob would do the task and then, when its iteration is complete, turn off the flag.
This way, the user can even close your application and check it back later, and the server will handle all the processing, uninterrupted by client activity.
Putting a sleep inside your server-side php script can only make it worse. It leads to more processes sticking around, which turns out to increase parallel working/sleeping processes count, which adds up to increased memory usage.
Don't fear that so much processes can be done in parallel. Usually an apache server is configured to process no more than 150 requests in parallel. A well configured server does not process more requests in parallel than resources are available (good administrators do some calculations beforehand). The other requests have to wait - and given your count of requests it's probable that they are going to timeout before being processed.
Your concerns should however be about client-side resources but it looks like your script only starts a new request when the previous returned. BTW: Well behaving HTTP clients (which your browser should be) start no more than 6 requests in parallel to the same IP.
Update: Besides the above you should seriously consider redesigning your approach to mass-processing (similar to as #Joel suggested) - but this should go to another question.