cURL Mult Simultaneous Requests (domain check) - php

I'm trying to take a list of 20,000 + domain names and check if they are "alive". All I really need is a simple http code check but I can't figure out how to get that working with curl_multi. On a separate script I'm using I have the following function which simultaneously checks a batch of 1000 domains and returns the json response code. Maybe this can be modified to just get the http response code instead of the page content?
(sorry about the syntax I couldn't get it to paste as a nice block of code without going line by line and adding 4 spaces...(also tried skipping a line and adding 8 spaces)
$dotNetRequests = array of domains...
//loop through arrays
foreach(array_chunk($dotNetRequests, 1000) as $Netrequests) {
$results = checkDomains($Netrequests);
$NetcurlRequest = array_merge($NetcurlRequest, $results);
}
function checkDomains($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
// $result[$id] = curl_multi_getcontent($c);
// if($result[$id]) {
if (curl_multi_getcontent($c)){
//echo "yes";
$netName = $data[$id];
$dName = str_replace(".net", ".com", $netName);
$query = "Update table1 SET dotnet = '1' WHERE Domain = '$dName'";
mysql_query($query);
}
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}

In any other language you would thread this kind of operation ...
https://github.com/krakjoe/pthreads
And you can in PHP too :)
I would suggest a few workers rather than 20,000 individual threads ... not that 20,000 threads is out of the realms of possibility - it isn't ... but that wouldn't be a good use of resources, I would do as you are now and have 20 workers getting the results of 1000 domains each ... I assume you don't need me to give the example of getting a response code, I'm sure curl would give it to you, but it's probably overkill to use curl being that you do not require it's threading capabilities: I would fsockopen port 80, fprintf GET HTTP/1.0/\n\n, fgets the first line and close the connection ... if you're going to be doing this all the time then I would also use Connection: close so that the receiving machines are not holding connections unnecessary ...

This script works great for handling bulk simultaneous cURL requests using PHP.
I'm able to parse through 50k domains in just a few minutes using it!
https://github.com/petewarden/ParallelCurl/

Related

How to use PHP curl_multi_init() usage when I need to pass a custom options?

I have a custom cURL function that has to download huge number of images from remote server. I was banned a couple of times before when I used file_get_contents(). I found that curl_multi_init() is better option as with 1 connection it can download for example 20 images at once.
I made a custom functions that uses curl_init() and I am trying to figure out how I can implement curl_multi_init() so in my LOOP where I grab the list of 20 URLs from the database I can call my custom function and at the last loop to use curl_close(). At the current situation my function generates connection for each url in the LOOP. Here is the function:
function downloadUrlToFile($remoteurl,$newfileName){
$errors = 0;
$options = array(
CURLOPT_FILE => fopen('../images/products/'.$newfileName, 'w'),
CURLOPT_TIMEOUT => 28800,
CURLOPT_URL => $remoteurl,
CURLOPT_RETURNTRANSFER => 1
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$imageString =curl_exec($ch);
$image = imagecreatefromstring($imageString);
if($imageString !== false AND !empty($imageString)){
if ($image !== false){
$width_orig = imagesx($image);
if($width_orig > 1000){
$saveimage = copy_and_resize_remote_image_product($image,$newfileName);
}else $saveimage = file_put_contents('../images/products/'.$newfileName,$imageString);
}else $errors++;
}else $errors++;
curl_close($ch);
return $errors;
}
There has to be a way to use curl_multi_init() and my function downloadUrlToFile because:
I need to change the file name on the fly
In my function I am also checking several things for the remote image.. In the sample function I check the size only and resize it if neccessary but there is much more things done by this function (I cutted that part for shorter, but I also use the function to pass more variables..)
How should the code be changed so during the LOOP to connect only once to the remote server?
Thanks in advance
Try this pattern for Multi CURL
$urls = array($url_1, $url_2, $url_3);
$content = array();
$ch = array();
$mh = curl_multi_init();
foreach( $urls as $index => $url ) {
$ch[$index] = curl_init();
curl_setopt($ch[$index], CURLOPT_URL, $url);
curl_setopt($ch[$index], CURLOPT_HEADER, 0);
curl_setopt($ch[$index], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $ch[$index]);
}
$active = null;
for(;;) {
curl_multi_exec($mh, $active);
if($active < 1){
// all downloads completed
break;
}else{
// sleep-wait for more data to arrive on socket.
// (without this, we would be wasting 100% cpu of 1 core while downloading,
// with this, we'll be using like 1-2% cpu of 1 core instead.)
curl_multi_select($mh, 1);
}
}
foreach ( $ch AS $index => $c ) {
$content[$index] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
//You can add some functions here and use $content[$index]
}
curl_multi_close($mh);

PHP CLI writing and reading multiple raw tcp API's in parallel like curl_multi_init for https API's

My situation:
I have multiple servers running a raw TCP API that requires me to send a string to get information from them. I need to get a response within a timeout of 5 seconds. All APIs should be contacted at the same time and from there on they got 5 seconds to respond. (So the maximum execution time is 5 seconds for all servers at once)
I already managed to do so for HTTP/S APIs with PHP cURL:
// array of curl handles
$multiCurl = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
foreach ($row_apis as $api) {
$id = $api[0];
$ip = $api[1];
$port = $api[2];
// URL from which data will be fetched
$fetchURL = "$ip:$port/api/status";
$multiCurl[$id] = curl_init();
curl_setopt($multiCurl[$id], CURLOPT_URL,$fetchURL);
//curl_setopt($multiCurl[$id], CURLOPT_HEADER,0);
curl_setopt($multiCurl[$id], CURLOPT_HTTP_VERSION,CURL_HTTP_VERSION_1_1);
curl_setopt($multiCurl[$id], CURLOPT_CUSTOMREQUEST,"GET");
curl_setopt($multiCurl[$id], CURLOPT_TIMEOUT,5);
curl_setopt($multiCurl[$id], CURLOPT_RETURNTRANSFER,1);
curl_multi_add_handle($mh, $multiCurl[$id]);
}
$index=null;
do {
curl_multi_exec($mh,$index);
} while($index > 0);
// get content and remove handles
foreach($multiCurl as $k => $ch) {
$result[$k] = json_decode(curl_multi_getcontent($ch), true);
curl_multi_remove_handle($mh, $ch);
}
// close
curl_multi_close($mh);
This sample fetches all APIs at once and waits 5 seconds for a respose. It will never take longer than 5 seconds.
Is there a way to do the same thing with raw TCP APIs in PHP?
I already tried to use sockets and was able to get the information but every API is fetched after another, so the script takes way to long for multiple servers.
Thanks for your help.
EDIT:
I've tried to implement your suggestions and my code now looks like this:
$apis = array();
$apis[0] = array(1, "123.123.123.123", 1880);
$method = "summary";
$sockets = array();
//Create socket array
foreach($apis as $api){
$id = $api[0];
$ip = $api[1];
$port = $api[2];
$sockets[$id] = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_set_nonblock($sockets[$id]);
#socket_connect($sockets[$id], $ip, $port);
//socket_write($sockets[$id], $method);
}
//Write to every socket
/*
foreach($sockets as $socket){
socket_write($socket, $method);
//fwrite($socket, "$method");
}
*/
//Read all sockets for 5 seconds
$write = NULL;
$except = NULL;
$responses = socket_select($sockets, $write, $except, 5);
//Check result
if($responses === false){
echo "Did not work";
}
elseif($responses > 0){
echo "At least one has responded";
}
//Access the data
//???
But I'm getting a 0 as the result of socket_select...
When do I need to write the method to the socket?
And if I will get something back, how do I access the data that was in the response?
absolutely. set SO_SNDBUF to the appropriate size, so you can send all the requests instantly/non-blockingly, then send all the reqeusts, then start waiting for/reading the responses.
the easy way to do the reading is to call socket_set_block on the sockets, and read all responses 1 by 1, but this doesn't give a hard guarantee of a 5 second timeout (but then again, neither does your example curl_multi code), if you need a 5 second timeout, use socket_set_nonblock & socket_select instead.

Making multiple curl requests without timeout.

I am developing a system where I need to fetch 5000+ users location using multiple GET request. Unfortunately the API endpoint doesn't support multiple client ids. ie. I have to make 5000+ unique get requests to fetch their locations and use (the cumulative response) to make another API call.
I am using CURL to make the requests. I used the following snippet[1] to make the request.
<?php
function multiRequest($data, $options = array()) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
// extra options?
if (!empty($options)) {
curl_setopt_array($curly[$id], $options);
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
?>
It works perfectly for small number of requests but when I try to hit 1000+ it gets timeout.
$data = [];
for ($i = 0; $i < 1000; $i++) {
$data[] = 'https://foo.bar/api/loc/v/queries/location?address=XXXXXXXXX';
}
$token = $this->refresh();
$r = $this->multiRequest($data, $token);
What is the best approach to solve this issue?
a. Increase the maximum_execution_time of the PHP script or
b. Use something like multi threading or
c. Other
Is there a way to modify endpoint API to allow to process multiple ids? If yes, that is preferred, because if you run several thousands of requests at the same time, you actually making something like DDoS attack.
However, you may want to check PHP's curl_multi_* functions (http://us3.php.net/manual/en/function.curl-multi-exec.php).
Another link that can be useful: http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/

How to load multiple external files with PHP - fast?

Does anyone know what the best (or a really good) way is to load external files (about 10-20) from an api with performance in mind. Each session has different content. Currently I try "file_get_contents" but experience serious performance issues. I'm not really familiar with Curl but it seems performance wise to beat the good old PHP way. Any ideas/examples?
You could also use curl multi to grab multiple files at once, there is a tutorial here:
http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
<?php
//Copy & pasted from the above link
function multiRequest($data, $options = array()) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
// extra options?
if (!empty($options)) {
curl_setopt_array($curly[$id], $options);
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
?>
<?php
$data = array(
'http://search.yahooapis.com/VideoSearchService/V1/videoSearch?appid=YahooDemo&query=Pearl+Jam&output=json',
'http://search.yahooapis.com/ImageSearchService/V1/imageSearch?appid=YahooDemo&query=Pearl+Jam&output=json',
'http://search.yahooapis.com/AudioSearchService/V1/artistSearch?appid=YahooDemo&artist=Pearl+Jam&output=json'
);
$r = multiRequest($data);
echo '<pre>';
print_r($r);
/*
Array
(
[0] => {"ResultSet":{"totalResultsAvailable":"633","totalResultsReturned":...
[1] => {"ResultSet":{"totalResultsAvailable":"105342","totalResultsReturned":...
[2] => {"ResultSet":{"totalResultsAvailable":10,"totalResultsReturned":...
)
*/
?>
Or curl php docs http://www.php.net/manual/en/function.curl-multi-init.php
file_get_contents is probably the fastest method because it only requires one function call in PHP, whereas other methods such asfopen/fread/fclose take multiple calls, as does cUrl.
However fopen has the advantage of not requiring RAM equal to or greater than the file size, because you can handle the file piece by piece.
Overall, file_get_contents is a good general-purpose funciton, but depending on circumstantial influences you may need a different option.

How would I automate my array to be used with cURL?

I have an array containing the contents of a MySQL table. I need to put each of these contents into curl_multi_handles so that I can execute them all simultaneously
Here is the code for the array, in case it helps:
$SQL = mysql_query("SELECT url FROM urls") or die(mysql_error());
while($resultSet = mysql_fetch_array($SQL)){
$urls[]=$resultSet
}
So I need to put be able to send data to each url at the same time. I don't need to get any data back, and in fact I'll be having them time out after two seconds. It only needs to send the data and then close.
My code prior to this, was executing them one at a time. here is that code:
$SQL = mysql_query("SELECT url FROM shells") or die(mysql_error()); while($resultSet = mysql_fetch_array($SQL)){
$ch = curl_init($resultSet['url'] . $fullcurl); //load the urls and send GET data
curl_setopt($ch, CURLOPT_TIMEOUT, 2); //Only load it for two seconds (Long enough to send the data)
curl_exec($ch);
curl_close($ch);
So my question is: How can I load the contents of the array into curl_multi_handle, execute it, and then remove each handle and close the curl_multi_handle?
You still call curl_init and curl_setopt. Then you load it into a multi_handle, and keep calling execute until it's done. This is based on the documentation at curl_multi_init. Since you're timing out in two seconds, and not processing responses, I think you can just sleep for two seconds at a time. curl_multi_select might be better if you actually need to process the responses.
$SQL = mysql_query("SELECT url FROM shells") ;
$mh = curl_multi_init();
$handles = array();
while($resultSet = mysql_fetch_array($SQL)){
//load the urls and send GET data
$ch = curl_init($resultSet['url'] . $fullcurl);
//Only load it for two seconds (Long enough to send the data)
curl_setopt($ch, CURLOPT_TIMEOUT, 2);
curl_multi_add_handle($mh, $ch);
$handles[] = $ch;
}
// Create a status variable so we know when exec is done.
$running = null;
//execute the handles
do {
// Call exec. This call is non-blocking, meaning it works in the background.
curl_multi_exec($mh,$running);
// Sleep while it's executing. You could do other work here, if you have any.
sleep(2);
// Keep going until it's done.
} while ($running > 0);
// For loop to remove (close) the regular handles.
foreach($handles as $ch)
{
// Remove the current array handle.
curl_multi_remove_handle($mh, $ch);
}
// Close the multi handle
curl_multi_close($mh);
If i were you, i would write class mysql and a class curl.
Its very good at all.
First i would create a method witch would return all urls from a passed mysql result.
Something like
public function getUrls($mysql_fetch_array)
{
foreach($mysql_fetch_array as $result)
{
$urls[] = $result["url"];
}
}
then you could write a method like curlSend($url,$param)
//remember you have to edit i dont know your full code so its just
// a way you could do it
public function curlSend($url,$param="")
{
$ch = curl_init($resultSet['url'] . $fullcurl); //load the urls and send GET data
curl_setopt($ch, CURLOPT_TIMEOUT, 2); //Only load it for two seconds (Long enough to send the data)
curl_exec($ch);
curl_close($ch);
}
public function send()
{
$urls = getUrls($this->mysql->result($sql));
foreach($urls as $url)
{
$this->curlSend($url);
}
}
Now this is how you could do it.

Categories