I'm using curl_multi to process multiple API requests in parallel.
However, I've noticed there is a lot of fluctuation in the time it takes to complete the requests.
Is this related to the speed of the APIs themselves, or the timeout I set on curl_multi_select? Right now it is 0.05. Should it be less? How can I know this process is finishing the requests as fast as possible without wasted time in between checks to see if they're done?
<?php
// Build the multi-curl handle, adding each curl handle
$handles = array(/* Many curl handles*/);
$mh = curl_multi_init();
foreach($handles as $curl){
curl_multi_add_handle($mh, $curl);
}
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh, 0.05); // Should this value be less than 0.05?
} while ($running > 0);
// Close the handles
foreach($results as $curl){
curl_multi_remove_handle($mh, $curl);
}
curl_multi_close($mh);
?>
current implementation of curl_multi_select() in php doesn't block and doesn't respect timeout parameter, maybe it will be fixed later. the proper way of waiting is not implemented in your code, it have to be 2 loops, i will post some tested code from my bot as an example:
$running = 1;
while ($running)
{
# execute request
if ($a = curl_multi_exec($this->murl, $running)) {
throw BotError::text("curl_multi_exec[$a]: ".curl_multi_strerror($a));
}
# check finished
if (!$running) {
break;
}
# wait for activity
while (!$a)
{
if (($a = curl_multi_select($this->murl, $wait)) < 0)
{
throw BotError::text(
($a = curl_multi_errno($this->murl))
? "curl_multi_select[$a]: ".curl_multi_strerror($a)
: 'system select failed'
);
}
usleep($wait * 1000000);# wait for some time <1sec
}
}
doing
$running = null;
for(;;){
curl_multi_exec($mh, $running);
if($running <1){
break;
}
curl_multi_select($mh, 1);
}
should be better, then you'll avoid a useless select() when nothing is running..
Related
I have an API that I call about at least 10 times at the same time with different information.
This is the function I am currently using.
$mh = curl_multi_init();
$arr = array();
$rows = array();
while ($row = mysqli_fetch_array($query)) {
array_push($arr, initiate_curl($row, $mh));
array_push($rows, $row);
}
$running = null;
for(;;){
curl_multi_exec($mh, $running);
if(!$running){
break;
}
curl_multi_select($mh);
usleep(1);
}
sleep(1);
foreach($arr as $curl) {curl_multi_remove_handle($mh, $curl);}
curl_multi_close($mh);
foreach($arr as $key=>$curl) {
$result = curl_multi_getcontent($curl);
$dat = simplexml_load_string($result);
check_time($dat, $rows[$key], $fp);
}
It works fine when the number of requests is small, but when it grows some of the curls do not bring back the appropriate data. i.e. they return null, and I am guessing because the server goes through before anything is happening..
what can I do to make this work? I am unexperienced in php or server and am having a hard time going through documents..
if I create another php file in which I curl to the API to do stuff with the data, and multi_curl that php file, would it work better? (because in that case it won't be that important that some of the calls do not return the data.. Would that overload my server constantly?
This is not curl issue, this is server utilisation issue so try upgrading server.
Useful links: link1 , link2
Note: Be careful when using infinite loop [for(;;)] and regularly monitor CPU utilisation.
I would like to time my individual requests. They were timed before as separate curl requests. Now I would like to combine them with curl_multi_exec and continue to time how long they take. Here is my code:
$mh = curl_multi_init();
curl_multi_add_handle($mh, $ebayRequest);
curl_multi_add_handle($mh, $prosperentRequest);
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
$ebay = curl_multi_getcontent($ebayRequest);
$prosperent = curl_multi_getcontent($prosperentRequest);
//close the handles
curl_multi_remove_handle($mh, $ebayRequest);
curl_multi_remove_handle($mh, $prosperentRequest);
curl_multi_close($mh);
Here is the old code which timed them.
$ebayTime = microtime(true);
// $ebay = $this->ebayExecuteSearch($ebay);
Yii::warning("Ebay time: ".(microtime(true) - $ebayTime));
I thought curl_multi_info_read might help, but it doesn't look useful.
Here's a typical multi-curl request example for PHP:
$mh = curl_multi_init();
foreach ($urls as $index => $url) {
$curly[$index] = curl_init($url);
curl_setopt($curly[$index], CURLOPT_RETURNTRANSFER, 1);
curl_multi_add_handle($mh, $curly[$index]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while($running > 0);
// get content and remove handles
$result = array();
foreach($curly as $index => $c) {
$result[$index] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($this->mh);
The process involves 3 steps:
1. Preparing data
2. Sending requests and waiting until they're finished
3. Collecting responses
I'd like to split step #2 into 2 parts, first send all the requests and then collect all the response, and do some useful job instead of just waiting, for example, processing the responses of last group of requests.
So, how can I split this part of code
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while($running > 0);
into separate parts?
Send all the requests
Do some another job while waiting
Retrieve all the responses
I tried like this:
// Send the requests
$running = null;
curl_multi_exec($mh, $running);
// Do some job while waiting
// ...
// Get all the responses
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while($running > 0);
but it doesn't seem to work properly.
I found the solution, here's how to launch all the requests without waiting the response
do {
curl_multi_exec($mh, $running);
} while (curl_multi_select($mh) === -1);
then we can do any other jobs and catch the responses any time later.
I have a script which takes a some.txt file and reads the links and return if my websites backlink is there or not. But the problem is, it is very slow and I want to increase its speed. Is there any way to increase its speed?
<?php
ini_set('max_execution_time', 3000);
$source = file_get_contents("your-backlinks.txt");
$needle = "http://www.submitage.com"; //without http as I have imploded the http later in the script
$new = explode("\n",$source);
foreach ($new as $check) {
$a = file_get_contents(trim($check));
if (strpos($a,$needle)) {
$found[] = $check;
} else {
$notfound[] = $check;
}
}
echo "Matches that were found: \n ".implode("\n",$found)."\n";
echo "Matches that were not found \n". implode("\n",$notfound);
?>
Your biggest bottleneck is the fact that you are executing the HTTP requests in sequence, not in parallel. curl is able to perform multiple requests in parallel. Here's an example from the documentation, heavily adapted to use a loop and actually collect the results. I cannot promise it's correct, I only promise I've followed the documentation correctly:
$mh = curl_multi_init();
$handles = array();
foreach($new as $check){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $check);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_multi_add_handle($mh,$ch);
$handles[$check]=$ch;
}
// verbatim from the demo
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
// end of verbatim code
for($handles as $check => $ch){
$a = curl_multi_getcontent($ch)
...
}
You won't be able to squeeze any more speed out of the operation by optimizing the PHP, except maybe some faux-multithreading solution.
However, you could create a queue system that would allow you to run the check as a background task. Instead of checking the URLs as you iterate through them, add them to the queue instead. Then write a cron script that grabs unchecked URLs from the queue one by one, checks if they contain a reference to your domain and saves the result.
i am finding a Curl function which can open particular no. of webpage open at a time also there will no output or returndata false will more good . I need to access 5-10 url at a same time .. I heard abt Curl Multi Threading but dont have proper function or class to use it ..
i find some by searching but most of them seems to be loop mean it i not using continuous connection just one after one ! I want something which can connect multiple connection at a time not one by one !
I made one :
function mutload($url){
if(!is_array($url)){
exit;
}
for($i=0;$i<count($url);$i++){
// create both cURL resources
$ch[] = curl_init();
$ch[] = curl_init();
// set URL and other appropriate options
curl_setopt($ch[$i], CURLOPT_URL, $url[$i]);
curl_setopt($ch[$i], CURLOPT_HEADER, 0);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 0);
}
//create the multiple cURL handle
$mh = curl_multi_init();
for($i=0;$i<count($url);$i++){
//add the two handles
curl_multi_add_handle($mh,$ch[$i]);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
for($i=0;$i<count($url);$i++){
curl_multi_remove_handle($mh, $ch[$i]);
}
curl_multi_close($mh);
}
ok ! but i m confused that will it connect all the urls at a time or one by one ! mre over i am geeting the content also i only want to connect or request to the site do not need ay content from there i used RETURNTRASFER,false but didnt work .. please hlep me thanks !
You're looking for the curl_multi_* family of functions. Have a look at curl_multi_exec.
Set CURLOPT_NOBODY to prevent curl from downloading any cotent.
I didn't test your code but curl_multi adds items to a queue from a loop and process them in parallel. Sometimes there can be issues if you are trying to load 100s of URLs, but it should be fine for a few URLs. If you have long DNS lookups or slow servers, all your results will have to wait for the slowest request.
This code is tested and should work, it is somewhat similar to yours:
http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/