I am trying to fetch the header info from multiple webpages. I tried to do so using single cURL requests using the code shown below :
<?php
$arr = array(
"John", "Mary",
"William", " Peter",
"James", "Emma",
"George", "Elizabeth",
"Charles", "Margaret",
);
$ch = curl_init();
for($i=0; $i<sizeOf($arr); $i++){
$url = "https://example.com/".$arr[$i];
$options = array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_SSL_VERIFYPEER => FALSE,
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
);
curl_setopt_array( $ch, $options );
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ( $httpCode != 200 ){
echo $arr[$i]." Error<br>";
} else {
echo $arr[$i]." Success<br>";
}
}
curl_close($ch);
?>
But this code seems to take a very long execution time. I searched the internet & found curl_multi_exec which could be used to run multiple cURL requests at a time. So now I use this code :
<?php
ini_set('max_execution_time', 0);
$arr = array(
"John", "Mary",
"William", " Peter",
"James", "Emma",
"George", "Elizabeth",
"Charles", "Margaret",
);
function multiRequest($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = "https://example.com/".$data[$id];
$options = array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_SSL_VERIFYPEER => FALSE,
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
);
// extra options?
if (!empty($options)) {
curl_setopt_array($curly[$id], $options);
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
//Code to fetch header info
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
multiRequest($arr);
?>
How to fetch multiple header_info from curl_multi_init HTTP request?
This code from your first example:
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ( $httpCode != 200 ){
echo $arr[$i]." Error<br>";
} else {
echo $arr[$i]." Success<br>";
}
will work even if the curl handle was executed by curl_multi_exec().
In your second example, replace this code:
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
//Code to fetch header info
curl_multi_remove_handle($mh, $c);
}
with this:
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
$httpCode = curl_getinfo($c, CURLINFO_HTTP_CODE);
$url = curl_getinfo($c, CURLINFO_EFFECTIVE_URL);
if ( $httpCode != 200 ){
echo $url." Error<br>";
} else {
echo $url." Success<br>";
}
curl_multi_remove_handle($mh, $c);
}
Related
I am trying to perform multiple POST REST Call. The catch: doing multiple POST calls at the same time. I am fully aware and have worked with the library guzzle but I haven't figured away to do this properly. I can perform GET calls asynchronously but nothing at the same level for POST calls. Then I came across pthreads and I read through the documentation and was a bit confused on how to even start it off. I have compiled php with the pthreads extension.
Could someone advise how to perform multiple POST calls at the same time and be able to gather the responses for later manipulation?
The below is a basic implementation that loops and waits. Very slow overall.
$postDatas = [
['field' => 'test'],
['field' => 'test1'],
['field' => 'test2'],
];
foreach ($postDatas as $postData) {
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "https://www.apisite.com",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => json_encode($postData),
CURLOPT_HTTPHEADER => [
"cache-control: no-cache",
"connection: keep-alive",
"content-type: application/json",
"host: some.apisite.com",
],
));
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}
}
That if the task is reduced to working with the API then you probably need to use http://php.net/manual/ru/function.curl-multi-exec.php
public function getMultiUrl() {
//If the connections are very much split the queue into parts
$parts = array_chunk($this->urlStack, self::URL_ITERATION_SIZE , TRUE);
//base options
$options = [
CURLOPT_USERAGENT => 'MyAPP',
CURLOPT_HEADER => false,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
];
foreach ($parts as $urls) {
$mh = curl_multi_init();
$active = null;
$connects = [];
foreach ($urls as $i => $url) {
$options[CURLOPT_POSTFIELDS] = $url['postData'];
$connects[$i] = curl_init($url['queryUrl']);
curl_setopt_array($connects[$i], $options);
curl_multi_add_handle($mh, $connects[$i]);
}
do {
$status = curl_multi_exec($mh, $active);
$info = curl_multi_info_read($mh);
if (false !== $info) {
var_dump($info);
}
} while ($status === CURLM_CALL_MULTI_PERFORM || $active);
foreach ($connects as $i => $conn) {
$content = curl_multi_getcontent($conn);
file_put_contents($this->dir . $i, $content);
curl_close($conn);
}
}
}
I have following code
function curl($url) {
$options = Array(
CURLOPT_HEADER => TRUE,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_URL => $url,
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch);
curl_close($ch);
return $data;
}
How to return the data as well as the http status code of the url being curled?
Common way to return a set of data from a function is to use an array, either numerical or associative. This can be done like:
curl($url) {
// other codes here
return array($data, $httpCode);
}
// or
curl($url) {
// other codes here
return array(
'data' => $data,
'statusCode' => $httpCode
);
}
I want to store curl_multi_exec records to a variable but it didn't work out for me after using CURLOPT_RETURNTRANSFER = TRUE, then I did some research and add curl_multi_getcontent this works fine I mean its record values for the variable but the problem is it only stores few results in the variable.
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $stream_url,
CURLOPT_ENCODING => "gzip",
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
CURLOPT_TIMEOUT => 10,
CURLOPT_USERPWD => $user.":".$pass,
CURLOPT_WRITEFUNCTION => "print_out_data",
//CURLOPT_RETURNTRANSFER => true,
CURLOPT_VERBOSE => true // uncomment for curl verbosity
));
$running = null;
$mh = curl_multi_init();
curl_multi_add_handle($mh, $ch);
do {
curl_multi_select($mh, 1);
curl_multi_exec($mh, $running);
$content = curl_multi_getcontent($ch);
$arr = json_decode($content, true);
// print_r($arr);
$foo = $arr['id'];
$bar = $arr['body'];
} while($running > 0);
curl_multi_remove_handle($mh, $ch);
curl_multi_close($ch);
Before do{}while() write
$content = array();
Line
$content = curl_multi_getcontent($ch);
Replace to
$content[] = curl_multi_getcontent($ch);
After your loop write
print_r($content);
Can you give me some idea how to improve this function so it handles unexpected reply when server returns output that is not in xml, eg a simple server error message in html and then retry fetching the xml?
function fetch_xml($url, $timeout=15)
{
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_HEADER => 0,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_CONNECTTIMEOUT => (int)$timeout,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_URL => $url)
);
$xml_data = curl_exec($ch);
curl_close($ch);
if (!empty($xml_data)) {
return new SimpleXmlElement($xml_data);
}
else {
return null;
}
}
You can give this a try. I haven't tested it out.
function fetch_xml($url, $timeout = 15, $max_attempts = 5, $attempts = 0)
{
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_HEADER => 0,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_CONNECTTIMEOUT => (int)$timeout,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_URL => $url)
);
$xml_data = curl_exec($ch);
curl_close($ch);
if ($attempts <= $max_attempts && !empty($xml_data)) // don't infinite loop
{
try
{
return new SimpleXmlElement($xml_data);
}
catch (Exception $e)
{
return fetch_xml($url, (int)$timeout, $max_attempts, $attempts++);
}
}
return NULL;
}
I have used a curl single init to issue an HTTP Get and all worked fine.
Now I tried to use a multi init (as I need to get multiple URLs) and I get a 401 message with "This request requires HTTP authentication" on the response to the Get.
Same Curl options where used on both cases.
Here is the code for th multi init and below it the single init function.
protected function _multiQueryRunkeeper($uri, $subscribersInfo,$acceptHeader) {
$curlOptions = array(
CURLOPT_URL => 'https://api.runkeeper.com' . $uri,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_TIMEOUT => 8,
CURLOPT_HTTPAUTH => CURLAUTH_ANY,
CURLOPT_HTTPGET => true
);
$curl_array = array();
$mh = curl_multi_init();
foreach ($subscribersInfo as $i => $subscriber) {
$curl_array[$i] = curl_init();
curl_setopt_array($curl_array[$i],$curlOptions);
curl_setopt($curl_array[$i], CURLOPT_HEADER,
array('Authorization: Bearer '.$subscriber['token'],
'Accept: application/vnd.com.runkeeper.' . $acceptHeader));
curl_multi_add_handle($mh,$curl_array[$i]);
}
$running = NULL;
do {
usleep(10000);
curl_multi_exec($mh,$running);
} while($running > 0);
$subscribersWorkoutFeed = array();
foreach($subscribersInfo as $i => $subscriber)
{
$subscribersWorkoutFeed[$i] = curl_multi_getcontent($curl_array[$i]);
curl_multi_remove_handle($mh, $curl_array[$i]);
}
curl_multi_close($mh);
return $subscribersWorkoutFeed;
}
protected function _singleQueryRunkeeper($uri, $subscriberToken,$acceptHeader) {
try{
// get fitness user's fitness activities from Runkeeper
$this->_curl = isset($this->_curl)? $this->_curl : curl_init();
$curlOptions = array(
CURLOPT_URL => 'https://api.runkeeper.com' . $uri,
CURLOPT_HTTPHEADER => array('Authorization: Bearer '.$subscriberToken,
'Accept: application/vnd.com.runkeeper.' . $acceptHeader),
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_TIMEOUT => 8,
CURLOPT_HTTPGET => true
);
curl_setopt_array($this->_curl,$curlOptions);
$response = curl_exec($this->_curl);
if($response == false) {
if (Zend_Registry::isRegistered('logger')) {
$logger = Zend_Registry::get('logger');
$logger->log('Curl error on _singleQueryRunkeeper: '
. curl_error($this->_curl), Zend_Log::INFO);
}
return null;
}
$data = Zend_Json::decode($response);
return($data);
} catch(Exception $e){
if (Zend_Registry::isRegistered('logger')) {
$logger = Zend_Registry::get('logger');
$logger->log('exception occured on getUsersLatestWorkoutsFromRK. Curl error'
. curl_error($this->_curl), Zend_Log::INFO);
}
}
}