curl_multi_exec() not requesting all - php

I am trying to use curl_multi_exec() in php with about I am guessing 4000 post calls and getting the return (json). However, After 234 records in my results, my print_r starts showing nothing. I can change the post call url's since each of my URLs has a different postfield, but I would still get 234 results. Does anybody know if there are any limits to curl_multi_exec(). I am using an xampp server on my computer to retrieve the json off a remote server. Is it an option in my xampp install that is preventing more results or a server end limits on my connections?
Thanks. My code for the function is below. The function takes in input $opt which is an array of the curl options.
$ch = array();
$results = array();
$mh = curl_multi_init();
foreach($opt as $handler => $array)
{
//print_r($array);
//echo "<br><br>";
$ch[$handler] = curl_init();
curl_setopt_array($ch[$handler],$array);
curl_multi_add_handle($mh, $ch[$handler]);
}
$running = null;
do {
curl_multi_exec($mh, $running);
}
while ($running > 0);
// Get content and remove handles.
foreach ($ch as $key => $val) {
$results[$key] = json_decode(curl_multi_getcontent($val),true);
curl_multi_remove_handle($mh, $val);
}
curl_multi_close($mh);
return $results;

Related

php curling to a same API numerous times with curl_multi

I have an API that I call about at least 10 times at the same time with different information.
This is the function I am currently using.
$mh = curl_multi_init();
$arr = array();
$rows = array();
while ($row = mysqli_fetch_array($query)) {
array_push($arr, initiate_curl($row, $mh));
array_push($rows, $row);
}
$running = null;
for(;;){
curl_multi_exec($mh, $running);
if(!$running){
break;
}
curl_multi_select($mh);
usleep(1);
}
sleep(1);
foreach($arr as $curl) {curl_multi_remove_handle($mh, $curl);}
curl_multi_close($mh);
foreach($arr as $key=>$curl) {
$result = curl_multi_getcontent($curl);
$dat = simplexml_load_string($result);
check_time($dat, $rows[$key], $fp);
}
It works fine when the number of requests is small, but when it grows some of the curls do not bring back the appropriate data. i.e. they return null, and I am guessing because the server goes through before anything is happening..
what can I do to make this work? I am unexperienced in php or server and am having a hard time going through documents..
if I create another php file in which I curl to the API to do stuff with the data, and multi_curl that php file, would it work better? (because in that case it won't be that important that some of the calls do not return the data.. Would that overload my server constantly?
This is not curl issue, this is server utilisation issue so try upgrading server.
Useful links: link1 , link2
Note: Be careful when using infinite loop [for(;;)] and regularly monitor CPU utilisation.

PHP CLI writing and reading multiple raw tcp API's in parallel like curl_multi_init for https API's

My situation:
I have multiple servers running a raw TCP API that requires me to send a string to get information from them. I need to get a response within a timeout of 5 seconds. All APIs should be contacted at the same time and from there on they got 5 seconds to respond. (So the maximum execution time is 5 seconds for all servers at once)
I already managed to do so for HTTP/S APIs with PHP cURL:
// array of curl handles
$multiCurl = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
foreach ($row_apis as $api) {
$id = $api[0];
$ip = $api[1];
$port = $api[2];
// URL from which data will be fetched
$fetchURL = "$ip:$port/api/status";
$multiCurl[$id] = curl_init();
curl_setopt($multiCurl[$id], CURLOPT_URL,$fetchURL);
//curl_setopt($multiCurl[$id], CURLOPT_HEADER,0);
curl_setopt($multiCurl[$id], CURLOPT_HTTP_VERSION,CURL_HTTP_VERSION_1_1);
curl_setopt($multiCurl[$id], CURLOPT_CUSTOMREQUEST,"GET");
curl_setopt($multiCurl[$id], CURLOPT_TIMEOUT,5);
curl_setopt($multiCurl[$id], CURLOPT_RETURNTRANSFER,1);
curl_multi_add_handle($mh, $multiCurl[$id]);
}
$index=null;
do {
curl_multi_exec($mh,$index);
} while($index > 0);
// get content and remove handles
foreach($multiCurl as $k => $ch) {
$result[$k] = json_decode(curl_multi_getcontent($ch), true);
curl_multi_remove_handle($mh, $ch);
}
// close
curl_multi_close($mh);
This sample fetches all APIs at once and waits 5 seconds for a respose. It will never take longer than 5 seconds.
Is there a way to do the same thing with raw TCP APIs in PHP?
I already tried to use sockets and was able to get the information but every API is fetched after another, so the script takes way to long for multiple servers.
Thanks for your help.
EDIT:
I've tried to implement your suggestions and my code now looks like this:
$apis = array();
$apis[0] = array(1, "123.123.123.123", 1880);
$method = "summary";
$sockets = array();
//Create socket array
foreach($apis as $api){
$id = $api[0];
$ip = $api[1];
$port = $api[2];
$sockets[$id] = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_set_nonblock($sockets[$id]);
#socket_connect($sockets[$id], $ip, $port);
//socket_write($sockets[$id], $method);
}
//Write to every socket
/*
foreach($sockets as $socket){
socket_write($socket, $method);
//fwrite($socket, "$method");
}
*/
//Read all sockets for 5 seconds
$write = NULL;
$except = NULL;
$responses = socket_select($sockets, $write, $except, 5);
//Check result
if($responses === false){
echo "Did not work";
}
elseif($responses > 0){
echo "At least one has responded";
}
//Access the data
//???
But I'm getting a 0 as the result of socket_select...
When do I need to write the method to the socket?
And if I will get something back, how do I access the data that was in the response?
absolutely. set SO_SNDBUF to the appropriate size, so you can send all the requests instantly/non-blockingly, then send all the reqeusts, then start waiting for/reading the responses.
the easy way to do the reading is to call socket_set_block on the sockets, and read all responses 1 by 1, but this doesn't give a hard guarantee of a 5 second timeout (but then again, neither does your example curl_multi code), if you need a 5 second timeout, use socket_set_nonblock & socket_select instead.

Making multiple curl requests without timeout.

I am developing a system where I need to fetch 5000+ users location using multiple GET request. Unfortunately the API endpoint doesn't support multiple client ids. ie. I have to make 5000+ unique get requests to fetch their locations and use (the cumulative response) to make another API call.
I am using CURL to make the requests. I used the following snippet[1] to make the request.
<?php
function multiRequest($data, $options = array()) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
// extra options?
if (!empty($options)) {
curl_setopt_array($curly[$id], $options);
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
?>
It works perfectly for small number of requests but when I try to hit 1000+ it gets timeout.
$data = [];
for ($i = 0; $i < 1000; $i++) {
$data[] = 'https://foo.bar/api/loc/v/queries/location?address=XXXXXXXXX';
}
$token = $this->refresh();
$r = $this->multiRequest($data, $token);
What is the best approach to solve this issue?
a. Increase the maximum_execution_time of the PHP script or
b. Use something like multi threading or
c. Other
Is there a way to modify endpoint API to allow to process multiple ids? If yes, that is preferred, because if you run several thousands of requests at the same time, you actually making something like DDoS attack.
However, you may want to check PHP's curl_multi_* functions (http://us3.php.net/manual/en/function.curl-multi-exec.php).
Another link that can be useful: http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/

Fastest way to download multiple urls

I have a web portal that needs to download many of separate json files and display their contents in a sort of form view. By lots I mean 32 separate files minimum.
I've tried cUrl with brute force iteration and its taking ~12.5 seconds.
I've tried curl_multi_exec as demonstrated here http://www.php.net/manual/en/function.curl-multi-init.php with the function below and its taking ~9 seconds. A little better but still terribly slow.
function multiple_threads_request($nodes){
$mh = curl_multi_init();
$curl_array = array();
foreach($nodes as $i => $url)
{
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $curl_array[$i]);
}
$running = NULL;
do {
curl_multi_exec($mh,$running);
} while($running > 0);
$res = array();
foreach($nodes as $i => $url)
{
$res[$url] = curl_multi_getcontent($curl_array[$i]);
}
foreach($nodes as $i => $url){
curl_multi_remove_handle($mh, $curl_array[$i]);
}
curl_multi_close($mh);
return $res;
}
I realize this is an inherently expensive operation but does anyone know any other alternatives that might faster?
EDIT: In the end, my system was limiting curl_multi_exec and moving the code to a production machine saw dramatic improvements
You should definitely look into benchmarking your cURLs to see which one has the slowdown but this was too lengthy for a comment so let me know if it helps or not:
// revert to "cURLing with brute force iteration" as you described it :)
$curl_timer = array();
foreach($curlsite as $row)
{
$start = microtime(true);
/**
* curl code
*/
$curl_timer[] = (microtime(true)-$start);
}
echo '<pre>'.print_r($curl_timer, true).'</pre>';

cURL Mult Simultaneous Requests (domain check)

I'm trying to take a list of 20,000 + domain names and check if they are "alive". All I really need is a simple http code check but I can't figure out how to get that working with curl_multi. On a separate script I'm using I have the following function which simultaneously checks a batch of 1000 domains and returns the json response code. Maybe this can be modified to just get the http response code instead of the page content?
(sorry about the syntax I couldn't get it to paste as a nice block of code without going line by line and adding 4 spaces...(also tried skipping a line and adding 8 spaces)
$dotNetRequests = array of domains...
//loop through arrays
foreach(array_chunk($dotNetRequests, 1000) as $Netrequests) {
$results = checkDomains($Netrequests);
$NetcurlRequest = array_merge($NetcurlRequest, $results);
}
function checkDomains($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
// $result[$id] = curl_multi_getcontent($c);
// if($result[$id]) {
if (curl_multi_getcontent($c)){
//echo "yes";
$netName = $data[$id];
$dName = str_replace(".net", ".com", $netName);
$query = "Update table1 SET dotnet = '1' WHERE Domain = '$dName'";
mysql_query($query);
}
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
In any other language you would thread this kind of operation ...
https://github.com/krakjoe/pthreads
And you can in PHP too :)
I would suggest a few workers rather than 20,000 individual threads ... not that 20,000 threads is out of the realms of possibility - it isn't ... but that wouldn't be a good use of resources, I would do as you are now and have 20 workers getting the results of 1000 domains each ... I assume you don't need me to give the example of getting a response code, I'm sure curl would give it to you, but it's probably overkill to use curl being that you do not require it's threading capabilities: I would fsockopen port 80, fprintf GET HTTP/1.0/\n\n, fgets the first line and close the connection ... if you're going to be doing this all the time then I would also use Connection: close so that the receiving machines are not holding connections unnecessary ...
This script works great for handling bulk simultaneous cURL requests using PHP.
I'm able to parse through 50k domains in just a few minutes using it!
https://github.com/petewarden/ParallelCurl/

Categories