How would I automate my array to be used with cURL? - php

I have an array containing the contents of a MySQL table. I need to put each of these contents into curl_multi_handles so that I can execute them all simultaneously
Here is the code for the array, in case it helps:
$SQL = mysql_query("SELECT url FROM urls") or die(mysql_error());
while($resultSet = mysql_fetch_array($SQL)){
$urls[]=$resultSet
}
So I need to put be able to send data to each url at the same time. I don't need to get any data back, and in fact I'll be having them time out after two seconds. It only needs to send the data and then close.
My code prior to this, was executing them one at a time. here is that code:
$SQL = mysql_query("SELECT url FROM shells") or die(mysql_error()); while($resultSet = mysql_fetch_array($SQL)){
$ch = curl_init($resultSet['url'] . $fullcurl); //load the urls and send GET data
curl_setopt($ch, CURLOPT_TIMEOUT, 2); //Only load it for two seconds (Long enough to send the data)
curl_exec($ch);
curl_close($ch);
So my question is: How can I load the contents of the array into curl_multi_handle, execute it, and then remove each handle and close the curl_multi_handle?

You still call curl_init and curl_setopt. Then you load it into a multi_handle, and keep calling execute until it's done. This is based on the documentation at curl_multi_init. Since you're timing out in two seconds, and not processing responses, I think you can just sleep for two seconds at a time. curl_multi_select might be better if you actually need to process the responses.
$SQL = mysql_query("SELECT url FROM shells") ;
$mh = curl_multi_init();
$handles = array();
while($resultSet = mysql_fetch_array($SQL)){
//load the urls and send GET data
$ch = curl_init($resultSet['url'] . $fullcurl);
//Only load it for two seconds (Long enough to send the data)
curl_setopt($ch, CURLOPT_TIMEOUT, 2);
curl_multi_add_handle($mh, $ch);
$handles[] = $ch;
}
// Create a status variable so we know when exec is done.
$running = null;
//execute the handles
do {
// Call exec. This call is non-blocking, meaning it works in the background.
curl_multi_exec($mh,$running);
// Sleep while it's executing. You could do other work here, if you have any.
sleep(2);
// Keep going until it's done.
} while ($running > 0);
// For loop to remove (close) the regular handles.
foreach($handles as $ch)
{
// Remove the current array handle.
curl_multi_remove_handle($mh, $ch);
}
// Close the multi handle
curl_multi_close($mh);

If i were you, i would write class mysql and a class curl.
Its very good at all.
First i would create a method witch would return all urls from a passed mysql result.
Something like
public function getUrls($mysql_fetch_array)
{
foreach($mysql_fetch_array as $result)
{
$urls[] = $result["url"];
}
}
then you could write a method like curlSend($url,$param)
//remember you have to edit i dont know your full code so its just
// a way you could do it
public function curlSend($url,$param="")
{
$ch = curl_init($resultSet['url'] . $fullcurl); //load the urls and send GET data
curl_setopt($ch, CURLOPT_TIMEOUT, 2); //Only load it for two seconds (Long enough to send the data)
curl_exec($ch);
curl_close($ch);
}
public function send()
{
$urls = getUrls($this->mysql->result($sql));
foreach($urls as $url)
{
$this->curlSend($url);
}
}
Now this is how you could do it.

Related

What would be the best way to collect the titles (in bulk) of a subreddit

I am looking to collect the titles of all of the posts on a subreddit, and I wanted to know what would be the best way of going about this?
I've looked around and found some stuff talking about Python and bots. I've also had a brief look at the API and am unsure in which direction to go.
As I do not want to commit to find out 90% of the way through it won't work, I ask if someone could point me in the right direction of language and extras like any software needed for example pip for Python.
My own experience is in web languages such as PHP so I initially thought of a web app would do the trick but am unsure if this would be the best way and how to go about it.
So as my question stands
What would be the best way to collect the titles (in bulk) of a
subreddit?
Or if that is too subjective
How do I retrieve and store all the post titles of a subreddit?
Preferably needs to :
do more than 1 page of (25) results
save to a .txt file
Thanks in advance.
PHP; in 25 lines:
$subreddit = 'pokemon';
$max_pages = 10;
// Set variables with default data
$page = 0;
$after = '';
$titles = '';
do {
$url = 'http://www.reddit.com/r/' . $subreddit . '/new.json?limit=25&after=' . $after;
// Set URL you want to fetch
$ch = curl_init($url);
// Set curl option of of header to false (don't need them)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Set curl option of nobody to false as we need the body
curl_setopt($ch, CURLOPT_NOBODY, 0);
// Set curl timeout of 5 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// Set curl to return output as string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Execute curl
$output = curl_exec($ch);
// Get HTTP code of request
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Close curl
curl_close($ch);
// If http code is 200 (success)
if ($status == 200) {
// Decode JSON into PHP object
$json = json_decode($output);
// Set after for next curl iteration (reddit's pagination)
$after = $json->data->after;
// Loop though each post and output title
foreach ($json->data->children as $k => $v) {
$titles .= $v->data->title . "\n";
}
}
// Increment page number
$page++;
// Loop though whilst current page number is less than maximum pages
} while ($page < $max_pages);
// Save titles to text file
file_put_contents(dirname(__FILE__) . '/' . $subreddit . '.txt', $titles);

Looping for certain count for curl post

I am new here to get answers for my issues, hoping for your kind advice. Thanks in advance.
I have written a HTTP API to send SMS using curl. Everything is working fine, except I am failing to loop and post curl for certain phone numbers. For example: User uploads 50000 phone numbers using excel sheet on my site, I fetch all the mobile numbers from the database, and then post it through CURL.
Now the sms gateway which I send the request accepts only maximum 10000 numbers at once via http api.
So from the 50000 fetched numbers I want to split the numbers to 10000 each and loop that and send curl post.
Here is my code
//have taken care of sql injection on live site
$resultRestore = mysql_query("SELECT * FROM temptable WHERE userid = '".$this->user_id."' AND uploadid='".$uploadid."' ");
$rowRestoreCount = mysql_num_rows($resultRestore);
#mysql_data_seek($resultRestore, 0);
$phone_list = "";
while($rowRestore = mysql_fetch_array($resultRestore))
{
$phone_list .= $rowRestore['recphone'].",";
}
$url = "http://www.smsgatewaycenter.com/library/send_sms_2.php?UserName=".urlencode($this->param[userid])."&Password=".urlencode($this->param[password])."&Type=Bulk&To=".urlencode(substr($phone_list, 0, -1))."&Mask=".urlencode($this->sendname)."&Message=Hello%20World";
//echo $url;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
Now, from the $phone_list, I need to loop for every 10000 numbers, How can I achieve this?
Its been 2 days, I have tried several things and not getting the result.
Kindly help...
NOTE: I'm going to start off with the obligatory warning about using mysql functions. Please consider switching to mysqli or PDO.
There are a number of different ways you could do this. Personally, I would reconfigure your script to only fetch 10,000 numbers at a time from the database and put that inside a loop. It might look something like this (note that for simplicity I am not updating your mysql* calls to mysqli*). Keep in mind I didn't run this through a compiler since most of your code I can't actually test
// defines where the query starts from
$offset= 0;
// defines how many to get with the query
$limit = 10000;
// set up base SQL to use over and over updating offset
$baseSql = "SELECT * FROM temptable WHERE userid = '".$this->user_id."' AND uploadid='".$uploadid."' LIMIT ";
// get first set of results
$resultRestore = mysql_query($baseSql . $offset . ', '. $limit);
// now loop
while (mysql_num_rows($resultRestore) > 0)
{
$rowRestoreCount = mysql_num_rows($resultRestore);
$phone_list = "";
while($rowRestore = mysql_fetch_array($resultRestore))
{
$phone_list .= $rowRestore['recphone'].",";
}
$url = "http://www.smsgatewaycenter.com/library/send_sms_2.php?UserName=".urlencode($this->param[userid])."&Password=".urlencode($this->param[password])."&Type=Bulk&To=".urlencode(substr($phone_list, 0, -1))."&Mask=".urlencode($this->sendname)."&Message=Hello%20World";
//echo $url;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
// now update for the while loop
// increment by value of limit
$offset += $limit;
// now re-query for the next 10000
// this will continue until there are no records left to retrieve
// this should work even if there are 50,123 records (the last loop will process 123 records)
$resultRestore = mysql_query($baseSql . $offset . ', '. $limit);
}
You could also achieve this without using offset and limit in your sql query. This might be a simpler approach for you:
// define our maximum chunk here
$max = 10000;
$resultRestore = mysql_query("SELECT * FROM temptable WHERE userid = '".$this->user_id."' AND uploadid='".$uploadid."' ");
$rowRestoreCount = mysql_num_rows($resultRestore);
#mysql_data_seek($resultRestore, 0);
$phone_list = "";
// hold the current number of processed phone numbers
$count = 0;
while($rowRestore = mysql_fetch_array($resultRestore))
{
$phone_list .= $rowRestore['recphone'].",";
$count++;
// when count hits our max, do the send
if ($count >= $max)
{
$url = "http://www.smsgatewaycenter.com/library/send_sms_2.php?UserName=".urlencode($this->param[userid])."&Password=".urlencode($this->param[password])."&Type=Bulk&To=".urlencode(substr($phone_list, 0, -1))."&Mask=".urlencode($this->sendname)."&Message=Hello%20World";
//echo $url;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
// now reset count back to zero
$count = 0;
// and reset phone_list
$phone_list = '';
}
}
// if we don't have # of phones evenly divisible by $max then handle any leftovers
if ($count > 0)
{
$url = "http://www.smsgatewaycenter.com/library/send_sms_2.php?UserName=".urlencode($this->param[userid])."&Password=".urlencode($this->param[password])."&Type=Bulk&To=".urlencode(substr($phone_list, 0, -1))."&Mask=".urlencode($this->sendname)."&Message=Hello%20World";
//echo $url;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
}
I notice that you are retrieving the information in $curl_scraped_page. In either of these scenarios above, you will need to account for the new loop if you're doing any processing on $curl_scraped_page.
Again, please consider switching to mysqli or PDO, and keep in mind that there are likely more efficient and flexible ways to achieve this than what you are doing here. For example, you might want to log successful sends in case your script breaks and incorporate that into your script (for example, by selecting from the database only those numbers that have not yet received this text). This would allow you to re-run your script but only send to those who did NOT yet receive the text, rather than hitting everyone again (or maybe your SMS gateway handles that for you?)
EDIT
Another approach would be to load all the retrieved numbers into a single array, then chunk the array into pieces and process each chunk.
$numbers = array();
while ($rowRestore = mysql_fetch_array($resultRestore))
{
$numbers[] = $rowRestore['recphone'];
}
// split into chunks of 10,000
$chunks = array_chunk($numbers, 10000);
// loop and process the chunks
foreach ($chunks AS $chunk)
{
// $chunk will be an array, so implode it with comma to get the phone list
$phone_list = implode(',', $chunk);
// note that there is no longer a need to substr -1 the $phone_list because it won't have a trailing comma using implode()
$url = "http://www.smsgatewaycenter.com/library/send_sms_2.php?UserName=".urlencode($this->param[userid])."&Password=".urlencode($this->param[password])."&Type=Bulk&To=".urlencode($phone_list)."&Mask=".urlencode($this->sendname)."&Message=Hello%20World";
//echo $url;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
}

Identify specific curl multi response

I use curl_multi_exec() to request several websites in parallel. Say, URL1, URL2, and URL3. As soon as one of these websites returns a result, I can process it and then wait for the next response.
Now I need to know, based on the response of the request, which URL this result comes from. I cannot simply check the URL from the response as there might be redirections. So what is the best way to identify from which URL (URL1, URL2, or URL3) the response came from? Can the information from curl_multi_info_read() or curl_getinfo() somehow be used for that? Is there a cURL Option that I can set and request for that?
I also tried storing the cURL handlers before requesting the URLs and compare them with curl_multi_info_read($curlMultiHandle)['handle'] but as this is a resource, it is not really comparable.
Any ideas?
It is possible to attach custom data to handle
curl_setopt($handle, \CURLOPT_PRIVATE, json_encode(['id' => $query_id]));
and then fetch this data
curl_getinfo($handle, \CURLINFO_PRIVATE);
Suppose you have multiple Image objects for which you need to load the data. You run your requests in parallel and don't know the order of download completion. So you have to identify somehow your concrete Image object when you receive the data. Instead of using urls (which might change after redirection) as keys in an associative array of Image objects I recommend the following simple approach.
$mh = curl_multi_init();
$activeHandles = array();
$loadingImages = array();
function loadImage(Image $image) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $image->getUrl());
curl_multi_add_handle($mh, $ch);
...
$this->loadingImages[] = $image;
$activeHandles[] = $ch;
}
function retrieveImages() {
// Somewhere you run curl_multi_exec($mh, $running).
// Here you get the results.
while ($result = curl_multi_info_read($mh)) {
// How to get the data is out of our scope.
// We are interested in identifying the image object.
$ch = $result['handle'];
$idx = array_search($ch, $activeHandles);
$image = $loadingImages[$idx];
if ($success) {
// Don't remember to free resources!
unset($activeHandles[$idx]);
unset($loadingImages[$idx]);
curl_multi_remove_handle($mh, $ch);
........
}
}
}

PHP Scraper appears to be in an infinite loop

(I'm scraping this stuff with the permission of the website in question, by the way).
Pretty simple web scraper, was working fine when I was loading all the links by hand, but when I've tried to load them in via JSON and variables (so I can do lots of scraping with the one script and make the process more modular by just adding more links to JSON) it runs on an infinite loop.
(Page has been loading for about 15 minutes now)
Here is my JSON. Only one store is in there for testing purposes but there is going to be about 15 more.
[
{
"store":"Incu Men",
"cat":"Accessories",
"general_cat":"Accessories",
"spec_cat":"accessories",
"url":"http://www.incuclothing.com/shop-men/accessories/",
"baseurl":"http://www.incuclothing.com",
"next_select":"a.next",
"prod_name_select":".infobox .fn",
"label_name_select":".infobox .brand",
"desc_select":".infobox .description",
"price_select":"#price",
"mainImg_select":"",
"more_imgs":".product-images",
"product_url":".hproduct .photo-link"
}
]
Here is the PHP scraper code:
<?php
//Set infinite time limit
set_time_limit (0);
// Include simple html dom
include('simple_html_dom.php');
// Defining the basic cURL function
function curl($url) {
$ch = curl_init();
// Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url);
// Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Setting cURL's option to return the webpage data
$data = curl_exec($ch);
// Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch);
// Closing cURL
return $data;
// Returning the data from the function
}
function getLinks($catURL, $prodURL, $baseURL, $next_select) {
$urls = array();
while($catURL) {
echo "Indexing: $url" . PHP_EOL;
$html = str_get_html(curl($catURL));
foreach ($html->find($prodURL) as $el) {
$urls[] = $baseURL . $el->href;
}
$next = $html->find($next_select, 0);
$url = $next ? $baseURL . $next->href : null;
echo "Results: $next" . PHP_EOL;
}
return $urls;
}
$string = file_get_contents("jsonWorkers/incuMens.json");
$json_array = json_decode($string,true);
foreach ($json_array as $value){
$baseURL = $value['baseurl'];
$catURL = $value['url'];
$store = $value['store'];
$general_cat = $value['general_cat'];
$spec_cat = $value['spec_cat'];
$next_select = $value['next_select'];
$prod_name = $value['prod_name_select'];
$label_name = $value['label_name_select'];
$description = $value['desc_select'];
$price = $value['price_select'];
$prodURL = $value['product_url'];
if (!is_null($value['mainImg_select'])){
$mainImg = $value['mainImg_select'];
}
$more_imgs = $value['more_imgs'];
$allLinks = getLinks($catURL, $prodURL, $baseURL, $next_select);
}
?>
Any ideas why the script would be running infinitely and not returning anything/stopping/printing anything to screen? I'm just gonna let it run until it stops. When I was doing this by hand it would only take a minute or so, sometimes less, so I'm sure it's a problem with my variables/json but I can't for the life of me see what the issues lie.
Can anyone take a quick look and point me in the right direction?
There is a problem with your while($catURL) loop. What do you want to do ?
Moreover, you can force to display information on your browser with the flush() command.

cURL Mult Simultaneous Requests (domain check)

I'm trying to take a list of 20,000 + domain names and check if they are "alive". All I really need is a simple http code check but I can't figure out how to get that working with curl_multi. On a separate script I'm using I have the following function which simultaneously checks a batch of 1000 domains and returns the json response code. Maybe this can be modified to just get the http response code instead of the page content?
(sorry about the syntax I couldn't get it to paste as a nice block of code without going line by line and adding 4 spaces...(also tried skipping a line and adding 8 spaces)
$dotNetRequests = array of domains...
//loop through arrays
foreach(array_chunk($dotNetRequests, 1000) as $Netrequests) {
$results = checkDomains($Netrequests);
$NetcurlRequest = array_merge($NetcurlRequest, $results);
}
function checkDomains($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
// $result[$id] = curl_multi_getcontent($c);
// if($result[$id]) {
if (curl_multi_getcontent($c)){
//echo "yes";
$netName = $data[$id];
$dName = str_replace(".net", ".com", $netName);
$query = "Update table1 SET dotnet = '1' WHERE Domain = '$dName'";
mysql_query($query);
}
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
In any other language you would thread this kind of operation ...
https://github.com/krakjoe/pthreads
And you can in PHP too :)
I would suggest a few workers rather than 20,000 individual threads ... not that 20,000 threads is out of the realms of possibility - it isn't ... but that wouldn't be a good use of resources, I would do as you are now and have 20 workers getting the results of 1000 domains each ... I assume you don't need me to give the example of getting a response code, I'm sure curl would give it to you, but it's probably overkill to use curl being that you do not require it's threading capabilities: I would fsockopen port 80, fprintf GET HTTP/1.0/\n\n, fgets the first line and close the connection ... if you're going to be doing this all the time then I would also use Connection: close so that the receiving machines are not holding connections unnecessary ...
This script works great for handling bulk simultaneous cURL requests using PHP.
I'm able to parse through 50k domains in just a few minutes using it!
https://github.com/petewarden/ParallelCurl/

Categories