I have an API that I call about at least 10 times at the same time with different information.
This is the function I am currently using.
$mh = curl_multi_init();
$arr = array();
$rows = array();
while ($row = mysqli_fetch_array($query)) {
array_push($arr, initiate_curl($row, $mh));
array_push($rows, $row);
}
$running = null;
for(;;){
curl_multi_exec($mh, $running);
if(!$running){
break;
}
curl_multi_select($mh);
usleep(1);
}
sleep(1);
foreach($arr as $curl) {curl_multi_remove_handle($mh, $curl);}
curl_multi_close($mh);
foreach($arr as $key=>$curl) {
$result = curl_multi_getcontent($curl);
$dat = simplexml_load_string($result);
check_time($dat, $rows[$key], $fp);
}
It works fine when the number of requests is small, but when it grows some of the curls do not bring back the appropriate data. i.e. they return null, and I am guessing because the server goes through before anything is happening..
what can I do to make this work? I am unexperienced in php or server and am having a hard time going through documents..
if I create another php file in which I curl to the API to do stuff with the data, and multi_curl that php file, would it work better? (because in that case it won't be that important that some of the calls do not return the data.. Would that overload my server constantly?
This is not curl issue, this is server utilisation issue so try upgrading server.
Useful links: link1 , link2
Note: Be careful when using infinite loop [for(;;)] and regularly monitor CPU utilisation.
Related
I'm using curl_multi to process multiple API requests in parallel.
However, I've noticed there is a lot of fluctuation in the time it takes to complete the requests.
Is this related to the speed of the APIs themselves, or the timeout I set on curl_multi_select? Right now it is 0.05. Should it be less? How can I know this process is finishing the requests as fast as possible without wasted time in between checks to see if they're done?
<?php
// Build the multi-curl handle, adding each curl handle
$handles = array(/* Many curl handles*/);
$mh = curl_multi_init();
foreach($handles as $curl){
curl_multi_add_handle($mh, $curl);
}
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh, 0.05); // Should this value be less than 0.05?
} while ($running > 0);
// Close the handles
foreach($results as $curl){
curl_multi_remove_handle($mh, $curl);
}
curl_multi_close($mh);
?>
current implementation of curl_multi_select() in php doesn't block and doesn't respect timeout parameter, maybe it will be fixed later. the proper way of waiting is not implemented in your code, it have to be 2 loops, i will post some tested code from my bot as an example:
$running = 1;
while ($running)
{
# execute request
if ($a = curl_multi_exec($this->murl, $running)) {
throw BotError::text("curl_multi_exec[$a]: ".curl_multi_strerror($a));
}
# check finished
if (!$running) {
break;
}
# wait for activity
while (!$a)
{
if (($a = curl_multi_select($this->murl, $wait)) < 0)
{
throw BotError::text(
($a = curl_multi_errno($this->murl))
? "curl_multi_select[$a]: ".curl_multi_strerror($a)
: 'system select failed'
);
}
usleep($wait * 1000000);# wait for some time <1sec
}
}
doing
$running = null;
for(;;){
curl_multi_exec($mh, $running);
if($running <1){
break;
}
curl_multi_select($mh, 1);
}
should be better, then you'll avoid a useless select() when nothing is running..
My situation:
I have multiple servers running a raw TCP API that requires me to send a string to get information from them. I need to get a response within a timeout of 5 seconds. All APIs should be contacted at the same time and from there on they got 5 seconds to respond. (So the maximum execution time is 5 seconds for all servers at once)
I already managed to do so for HTTP/S APIs with PHP cURL:
// array of curl handles
$multiCurl = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
foreach ($row_apis as $api) {
$id = $api[0];
$ip = $api[1];
$port = $api[2];
// URL from which data will be fetched
$fetchURL = "$ip:$port/api/status";
$multiCurl[$id] = curl_init();
curl_setopt($multiCurl[$id], CURLOPT_URL,$fetchURL);
//curl_setopt($multiCurl[$id], CURLOPT_HEADER,0);
curl_setopt($multiCurl[$id], CURLOPT_HTTP_VERSION,CURL_HTTP_VERSION_1_1);
curl_setopt($multiCurl[$id], CURLOPT_CUSTOMREQUEST,"GET");
curl_setopt($multiCurl[$id], CURLOPT_TIMEOUT,5);
curl_setopt($multiCurl[$id], CURLOPT_RETURNTRANSFER,1);
curl_multi_add_handle($mh, $multiCurl[$id]);
}
$index=null;
do {
curl_multi_exec($mh,$index);
} while($index > 0);
// get content and remove handles
foreach($multiCurl as $k => $ch) {
$result[$k] = json_decode(curl_multi_getcontent($ch), true);
curl_multi_remove_handle($mh, $ch);
}
// close
curl_multi_close($mh);
This sample fetches all APIs at once and waits 5 seconds for a respose. It will never take longer than 5 seconds.
Is there a way to do the same thing with raw TCP APIs in PHP?
I already tried to use sockets and was able to get the information but every API is fetched after another, so the script takes way to long for multiple servers.
Thanks for your help.
EDIT:
I've tried to implement your suggestions and my code now looks like this:
$apis = array();
$apis[0] = array(1, "123.123.123.123", 1880);
$method = "summary";
$sockets = array();
//Create socket array
foreach($apis as $api){
$id = $api[0];
$ip = $api[1];
$port = $api[2];
$sockets[$id] = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_set_nonblock($sockets[$id]);
#socket_connect($sockets[$id], $ip, $port);
//socket_write($sockets[$id], $method);
}
//Write to every socket
/*
foreach($sockets as $socket){
socket_write($socket, $method);
//fwrite($socket, "$method");
}
*/
//Read all sockets for 5 seconds
$write = NULL;
$except = NULL;
$responses = socket_select($sockets, $write, $except, 5);
//Check result
if($responses === false){
echo "Did not work";
}
elseif($responses > 0){
echo "At least one has responded";
}
//Access the data
//???
But I'm getting a 0 as the result of socket_select...
When do I need to write the method to the socket?
And if I will get something back, how do I access the data that was in the response?
absolutely. set SO_SNDBUF to the appropriate size, so you can send all the requests instantly/non-blockingly, then send all the reqeusts, then start waiting for/reading the responses.
the easy way to do the reading is to call socket_set_block on the sockets, and read all responses 1 by 1, but this doesn't give a hard guarantee of a 5 second timeout (but then again, neither does your example curl_multi code), if you need a 5 second timeout, use socket_set_nonblock & socket_select instead.
I am trying to use curl_multi_exec() in php with about I am guessing 4000 post calls and getting the return (json). However, After 234 records in my results, my print_r starts showing nothing. I can change the post call url's since each of my URLs has a different postfield, but I would still get 234 results. Does anybody know if there are any limits to curl_multi_exec(). I am using an xampp server on my computer to retrieve the json off a remote server. Is it an option in my xampp install that is preventing more results or a server end limits on my connections?
Thanks. My code for the function is below. The function takes in input $opt which is an array of the curl options.
$ch = array();
$results = array();
$mh = curl_multi_init();
foreach($opt as $handler => $array)
{
//print_r($array);
//echo "<br><br>";
$ch[$handler] = curl_init();
curl_setopt_array($ch[$handler],$array);
curl_multi_add_handle($mh, $ch[$handler]);
}
$running = null;
do {
curl_multi_exec($mh, $running);
}
while ($running > 0);
// Get content and remove handles.
foreach ($ch as $key => $val) {
$results[$key] = json_decode(curl_multi_getcontent($val),true);
curl_multi_remove_handle($mh, $val);
}
curl_multi_close($mh);
return $results;
When I run a check on 10 urls, if I am able to get a connection with the host server, the handle will return a success message (CURLE_OK)
When processing each handle if a server refuses the connection, the handle will include a error message.
The problem
I assumed that when we get a bad handle, CURL will mark this handle but continue to process the unprocessed handles, however this is not what seems to happen.
When we come across a bad handle, CURL will mark this handle as bad, but will not process the remaining unprocessed handles.
This can be hard to detect, if I do get a connection with all handles, which is what happens most of the time, then the problem is not visible.(CURL only stops on first bad connection);
For the test, I had to find a suitable site which loads slow/refuses x amount simultaneous of connections.
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT => 0,
CURLOPT_CONNECTTIMEOUT => 0,
CURLOPT_RETURNTRANSFER => 1
);
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
curl_multi_exec($mh,$active);
curl_multi_select($mh);
$info = curl_multi_info_read($mh);
echo '<pre>';
var_dump($info);
if($info['result'] === CURLE_OK)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
if($info['result'] != 0)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
} while ($active > 0);
curl_multi_close($mh);
I have dumped $info in the script which asks the Multi Handle if there is any new information on any handles whilst running.
When the script has ended we will see some bool(false) - when no new information was available(handles were still processing), along with all handles if all was successful or limited handles if one handle failed.
I have failed at fixing this, its probably something I have overlooked and I have gone too far down the road on attempting to fix things which are not relevant.
Some attempts at fixing this was.
Assign each $ch handle to a array - $ch[1], $ch[2] etc (instead of
adding current $ch handle to multi_handle then overwriting - as whats
in the test)
Removing handles after success/failure with
curl_multi_remove_handle
Set CURLOPT_CONNECTTIMEOUT and CURLOPT_TIMEOUT to infinity.
many more.(I will update this post as I have forgotten all of them)
Testing this with Php version 5.4.14
Hopefully I have illustrated the points well enough.
Thanks for reading.
I've been mucking around with your script for a while now trying to get it to work.It was only when I read Repeated calls to this function will return a new result each time, until a FALSE is returned as a signal that there is no more to get at this point., for http://se2.php.net/manual/en/function.curl-multi-info-read.php, that I realized a while loop might work.
The extra while loop makes it behave exactly how you'd expect. Here is the output I get:
http://smotri.com/video/list/sports/ failed
http://smotri.com/video/list/travel/ failed
http://smotri.com/video/list/gaming/ failed
http://smotri.com/video/list/erotic/ failed
http://smotri.com/video/list/humour/ failed
http://smotri.com/video/list/animals/ success
http://smotri.com/video/list/film/ success
http://smotri.com/video/list/auto/ success
http://smotri.com/video/list/ failed
http://smotri.com/video/list/hobby/ failed
http://smotri.com/video/list/mult/ failed
Here's the code I used for testing:
<?php
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
if($s%2)
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 3000,
CURLOPT_RETURNTRANSFER => 1,
);
}
else
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 4000,
CURLOPT_RETURNTRANSFER => 1,
);
}
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
$mrc = curl_multi_exec($mh,$active);
curl_multi_select($mh);
while($info = curl_multi_info_read($mh))
{
echo '<pre>';
//var_dump($info);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
} while ($active > 0);
curl_multi_close($mh);
Hope that helps. For testing just adjust CURLOPT_TIMEOUT_MS to your internet connection. I made it so it alternates between 3000 and 4000 milliseconds as 3000 will fail and 4000 usually succeeds.
Update
After going through the PHP and libCurl docs I have found how curl_multi_exec works (in libCurl its curl_multi_perform). Upon first being called it starts handling transfers for all the added handles (added before via curl_multi_add_handle).
The number it assigns $active is the number of transfers still running. So if it's less than the total number of handles you have then you know one or more transfers are complete. So curl_multi_exec acts as a kind of progress indicator as well.
As all transfers are handled in a non-blocking fashion (transfers can finish simultaneously) the while loop curl_multi_exec's in cannot represent each iteration of completed url requests.
All data is stored in a queue so as soon as one or more transfers are complete you can call curl_multi_info_read to fetch this data.
In my original answer I had curl_multi_info_read in a while loop. This loop would keep iterating until curl_multi_info_read found no remaining data in the queue. After which the outer while loop would move onto the next iteration if $active != 0 (meaning curl_multi_exec reported transfers still not complete).
To summarize, the outer loop keeps iterating when there are still transfers not completed and the inner loop iterates only when there's data from a completed transfer.
The PHP documentation is pretty bad for curl multi functions so I hope this cleared a few things up. Below is an alternative way to do the same thing.
do
{
curl_multi_exec($mh,$active);
} while ($active > 0);
// while($info = curl_multi_info_read($mh)) would work also here
for($i = 0; $i <= $f; $i++){
$info = curl_multi_info_read($mh);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
From this information you can also see curl_multi_select is not needed as you don't want something that blocks until there is activity.
With the code you provided in your question it only seemed like curl wasn't proceeding after a few failed transfers but there was actually still data queued in the buffer. Your code just wasn't calling curl_multi_info_read enough times. The reason all the successful transfers were picked up by your code is due to PHP being run on a single thread and so the script hanged waiting for the requests. The timeouts for the failed requests didn't impact PHP enough to make it hang/wait that long so the number of iterations the while loop was doing was less than the number of queued data.
I have a web portal that needs to download many of separate json files and display their contents in a sort of form view. By lots I mean 32 separate files minimum.
I've tried cUrl with brute force iteration and its taking ~12.5 seconds.
I've tried curl_multi_exec as demonstrated here http://www.php.net/manual/en/function.curl-multi-init.php with the function below and its taking ~9 seconds. A little better but still terribly slow.
function multiple_threads_request($nodes){
$mh = curl_multi_init();
$curl_array = array();
foreach($nodes as $i => $url)
{
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $curl_array[$i]);
}
$running = NULL;
do {
curl_multi_exec($mh,$running);
} while($running > 0);
$res = array();
foreach($nodes as $i => $url)
{
$res[$url] = curl_multi_getcontent($curl_array[$i]);
}
foreach($nodes as $i => $url){
curl_multi_remove_handle($mh, $curl_array[$i]);
}
curl_multi_close($mh);
return $res;
}
I realize this is an inherently expensive operation but does anyone know any other alternatives that might faster?
EDIT: In the end, my system was limiting curl_multi_exec and moving the code to a production machine saw dramatic improvements
You should definitely look into benchmarking your cURLs to see which one has the slowdown but this was too lengthy for a comment so let me know if it helps or not:
// revert to "cURLing with brute force iteration" as you described it :)
$curl_timer = array();
foreach($curlsite as $row)
{
$start = microtime(true);
/**
* curl code
*/
$curl_timer[] = (microtime(true)-$start);
}
echo '<pre>'.print_r($curl_timer, true).'</pre>';