Extract some data from an array PHP while using cURL - php

I'm using cURL.
I got the result in an array named resp as you can see below. But I don't know how to save only the first 19 characters into another array like var1[0].
Right now I only print them. (are like resp[0]='a', resp[1]='b', resp[2]='c',... and I want the first 19 characters to be saved into one call like var1[0]='abc..'.
I also tried with implode and array_merge, but no success.
$curl = curl_init();
// Set some options - we are passing in a useragent too here
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => $var
//CURLOPT_USERAGENT => 'Codular Sample cURL Request'
));
// Send the request & save response to $resp
$resp = curl_exec($curl);
echo($resp);
// Close request to clear up some resources
curl_close($curl);
for($i = 0; $i <=18; ++$i)
{
echo($resp[$i]);
}

Assuming that a character is stored at each index of $resp, you could take the following action:
$nineteen = '';
for($i = 0; $i <=18; ++$i) {
$nineteen .= $resp[$i];
}
$var = array($nineteen);
This code adds all 19 chars to a string and then adds them to the $var array. Simple + readable.
You can also do this with slice() and implode(). Both methodologies have roughly the same performance.

Related

something is not working in CURL scraping

I am trying to scrape torrentz2.eu search results by using old(dead) torrentz.eu scraper code:
when I run http://localhost/jits/torz/api.php?key=kabali
it showing me warning and null value.
Notice: Undefined variable: results_urls in /Applications/XAMPP/xamppfiles/htdocs/jits/torz/api.php on line 59
null
why?
can anybody tell me what's wrong with code.?
here is code:
<?php
$t= $_GET['key'];
// Defining the basic cURL function
function curl($url) {
// Assigning cURL options to an array
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
CURLOPT_CONNECTTIMEOUT => 120, // Setting the amount of time (in seconds) before the request times out
CURLOPT_TIMEOUT => 120, // Setting the maximum amount of time for cURL to execute queries
CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0", // Setting the useragent
CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
);
$ch = curl_init(); // Initialising cURL
curl_setopt_array($ch, $options); // Setting cURL's options using the previously assigned array data in $options
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
?>
<?php
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
?>
<?php
$url = "https://torrentz2.eu/search?f=$t"; // Assigning the URL we want to scrape to the variable $url
$results_page = curl($url); // Downloading the results page using our curl() funtion
//var_dump($results_page);
//die();
$results_page = scrape_between($results_page, "<dl><dt>", "<a href=\"http://www.viewme.com/search?q=$t\" title=\"Web search results on ViewMe\">"); // Scraping out only the middle section of the results page that contains our results
$separate_results = explode("</dd></dl>", $results_page); // Expploding the results into separate parts into an array
// For each separate result, scrape the URL
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = scrape_between($separate_result, "\">", "<b>"); // Scraping the page ID number and appending to the IMDb URL - Adding this URL to our URL array
}
}
//print_r($results_urls); // Printing out our array of URLs we've just scraped
if($_GET["key"] === null) {
echo "Keyword Missing ";
} else if(isset($_GET["key"])) {
echo json_encode($results_urls);
}
?>
for old torrentz.eu scraper code ref: GIT repo
First thing you get NOTICE "Undefined variable: results_urls" because $results_urls is defined and used directly. Define it and then use it.
Do something like:-
// $results_urls defined here:-
$results_urls = [];
// For each separate result, scrape the URL
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = scrape_between($separate_result, "\">", "<b>"); // Scraping the page ID number and appending to the IMDb URL - Adding this URL to our URL array
}
}
Secondly the null is printed because $results_urls is not getting populated because $separate_results is not getting populated correctly. It just has one value which is empty.
I debugged further and found $results_page value is false. So whatever you are trying to do in "scrape_between" function is not working as expected. Fix your function.

Can I put timeout on PHP for()?

I can't allow for file_get_contents to work more than 1 second, if it is not possible - I need to skip to next loop.
for ($i = 0; $i <=59; ++$i) {
$f=file_get_contents('http://example.com');
if(timeout<1 sec) - do something and loop next;
else skip file_get_contents(), do semething else, and loop next;
}
Is it possible to make a function like this?
Actually I'm using curl_multi and I can't fugure out how to set timeout on a WHOLE curl_multi request.
If you are working with http urls only you can do the following:
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
));
for ($i = 0; $i <=59; $i++) {
file_get_contents("http://example.com/", 0, $ctx);
}
However, this is just the read timeout, meaning the time between two read operations (or the time before the first read operation). If the download rate is constant, there should not being such gaps in the download rate and the download can take even an hour.
If you want the whole download not take more than a second you can't use file_get_contents() anymore. I would encourage to use curl in this case. Like this:
// create curl resource
$ch = curl_init();
for($i=0; $i<59; $i++) {
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
// set timeout
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
}
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
)
);
file_get_contents("http://example.com/", 0, $ctx);
Source

Php Curl and Json

My question is I want to get acess my fb friends using curl and decode into json then i want to show only those friends whose name starting with letter a such as aman,adam etc pls help me..Following is my code.
<?php
// create a new cURL resource
$json_url="https://graph.facebook.com/100001513782830/friends?access_token=AAACEdEose0cBAPdK62FSjs4RvA21efqc8ZBKyzAesT5r4VSpu0XScAYDtKrCxk4PmcRBVzE2SLiGvs2d5FeXvZAD72ZCShwge3vk4DQqRAb8vLlm1W3";
$ch = curl_init( $json_url );
// Configuring curl options
/* $options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('Content-type: application/json')
);
// Setting curl options
curl_setopt_array( $ch );
*/// Getting results
$result = curl_exec($ch); // Getting jSON result string
$obj = json_decode($result, true);
foreach($obj[data] as $p)
{
echo '
Name: '.$p[name][first].'
Age: '.$p[age].'
';
}
You will offcourse try not to hardcode "a" but for this purpose :
foreach($obj[data] as $p){
if(strtolower(substr(trim($p[name][first]),0,1)) == 'a'){
echo 'Name: '.$p[name][first].'Age: '.$p[age];
}
}
Btw, it is not a good idea to post security tokens (in URL) to public places.
Since the name is string, you can simply iterate over that array and filter by name:
$letter = 'A';
foreach($obj['data'] as $p) {
if ($p['name'][0] == $letter) {
// do something with $p
}
}
But there is a little problem with UTF-8 -- this solution (and that one with substr too) will not work on multibyte characters. So you need to use mb_substr instead of plain substr function:
foreach($obj['data'] as $p) {
if(mb_strtolower(mb_substr($p['name'], 0, 1))) == 'Á'){
echo "Name: ", $p['name'], "\n",
"Age: ", $p['age'], "\n";
}
}

How to use cURL to find subdomains containing specific strings?

I am trying to build a list of all the city pages on ghix.com, which does not have such a complete directory. To do this I am using their 'city id' which is unique for each city, but does not follow any particular order.
I am using cURL and PHP to loop through the possible domains to search for those that match actual cities. Simple enough. See the code bellow, which produces a 500 Internal Service Error.
If it worked, output should be a list of 'city ids' which do not match actual cities. If the URL matches an actual city it will not have (0) on the page, but if it does not match a city it will have (0) on the page.
I have looked over and corrected this several times, what is causing the error?
<html>
<?php
for ($i = 1; ; $i <= 1000000; $i++) {
$url = "http://www.ghix.com/goto/dynamic/city?CityID=" . $i;
$term="(0)";
curl_setopt($ch, CURLOPT_URL, trim($url));
$html = curl_exec($ch);
if ($html !== FALSE && stristr($html, $term) !== FALSE) { // Found!
echo $i;
Echo "br/";
}
}
?>
</html>
UPDATE
a slightly different approach I tried, with the same effect...
<html>
<?php
for ($i = 1; $i <= 100; $i++) {
$url = "http://www.ghix.com/goto/dynamic/city?CityID=" . $i;
$term="(0)";
curl_setopt($ch, CURLOPT_URL, trim($url));
$html = curl_exec($ch);
if (strpos($ch,$term)) {
echo $url;
echo "<br>";
}
?>
</html>
In your first chunk of code, you have an extra ; in the for conditions. Next, you need to initialize cURL with $ch = curl_init(); around the beginning. That opens the handler $ch that you call on later. Finally, use 1= for the false condition for if instead of the exclamation and double equal. After these fixes, I'm not getting any 500 errors. After that, it's just a matter of collecting the data from the pages and putting it in the right places.
In the second chunk of code, you still need to initialize cURL. Then you need to put in another curly bracket at the end to close out the for loop. And then it's a matter of dealing with the output from cURL.
It seems you are getting more typo errors than anything. Watch your server logs and they will tell you more about what you are looking for. And for cURL, read up on the options that you can set in PHP on the PHP site. It's a good read. Good luck.
What I have found is that you can use the auto-complete JSON request to find all the city ids.
The JSON request url is http://www.ghix.com/goto/dynamic/suggest?InputVarName=q&q=FRAGMENT&Type=json&SkipPrerequisites=1. Here FRAGMENT is the letters you type in the input box. Iteratively request on that URL would reveal all the CityIDs you are looking for.
Be aware, they may have bot protection for such ajax queries.
I saw their JSON is malformed. It can be fixed using Services_JSON pear package.
require_once 'Services/JSON.php';
$Services_JSON = new Services_JSON();
$json = $Services_JSON->decode($jsons);
Using dxtool/WebGet the following code seems to be working.
require_once("WebGet.php");
// common headers to make this request more real.
$headers = array(
"Accept-Charset" => "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Accept-Encoding" => "gzip, deflate",
"Accept-Language" => "en-us,en;q=0.5",
"Connection" => "keep-alive",
"Referer" => "http://www.ghix.com/goto/dynamic/search",
"User-Agent" => "Mozilla/5.0 (Ubuntu; X11; Linux x86_64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"
);
$w = new WebGet();
$w->cookieFile = dirname(__FILE__) . "/cookie.txt";
// do a page landing
$w->requestContent("http://www.ghix.com/goto/dynamic/search", array(), array(), $headers);
$city_ids = array();
for ($i = 1; $i <= 1000; $i++) {
$url = "http://www.ghix.com/goto/dynamic/city?CityID=" . $i;
$term = "> (0)<";
$w->requestContent($url, array(), array(), $headers);
// sleep some times to make it more like request from human
usleep(10000);
if (strpos($w->cachedContent, $term) !== FALSE){
$city_ids[] = $i;
echo $url, PHP_EOL;
}
}
print_r($city_ids);

Getting final urls of shortened urls (like bit.ly) using php

[Updated At Bottom]
Hi everyone.
Start With Short URLs:
Imagine that you've got a collection of 5 short urls (like http://bit.ly) in a php array, like this:
$shortUrlArray = array("http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123");
End with Final, Redirected URLs:
How can I get the final url of these short urls with php? Like this:
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
I have one method (found online) that works well with a single url, but when looping over multiple urls, it only works with the final url in the array. For your reference, the method is this:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//$header['errno'] = $err;
//$header['errmsg'] = $errmsg;
//$header['content'] = $content;
print($header[0]);
return $header;
}
//Using the above method in a for loop
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i <= $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
array_push($finalURLs, $rawURL);
}
Close, but not enough
This method works, but only with a single url. I Can't use it in a for loop which is what I want to do. When used in the above example in a for loop, the first four elements come back unchanged, and only the final element is converted into its final url. This happens whether your array is 5 elements or 500 elements long.
Solution Sought:
Please give me a hint as to how you'd modify this method to work when used inside of a for loop with collection of urls (Rather than just one).
-OR-
If you know of code that is better suited for this task, please include it in your answer.
Thanks in advance.
Update:
After some further prodding I've found that the problem lies not in the above method (which, after all, seems to work fine in for loops) but possibly encoding. When I hard-code an array of short urls, the loop works fine. But when I pass in a block of newline-seperated urls from an html form using GET or POST, the above mentioned problem ensues. Are the urls somehow being changed into a format not compatible with the method when I submit the form????
New Update:
You guys, I've found that my problem was due to something unrelated to the above method. My problem was that the URL encoding of my short urls converted what i thought were just newline characters (separating the urls) into this: %0D%0A which is a line feed or return character... And that all short urls save for the final url in the collection had a "ghost" character appended to the tail, thus making it impossible to retrieve the final urls for those only. I identified the ghost character, corrected my php explode, and all works fine now. Sorry and thanks.
This may be of some help: How to put string in array, split by new line?
You would probably do something like this, assuming you're getting the URLs returned in POST:
$final_urls = array();
$short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "\n" or "\r\n", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string.
foreach ( $short_urls as $short ) {
$final_urls[] = get_web_page( $short );
}
I get the following output, using var_dump($final_urls); and your bit.ly url:
http://codepad.org/8YhqlCo1
And my source: $_POST['short_urls'] = "http://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123";
I also got an error, using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27 Line 27: print($header[0]); I'm not sure what you wanted there...
Here's my test.php, if it will help: http://codepad.org/zI2wAOWL
I think you are almost have it there. Try this:
$shortUrlArray = array("http://yhoo.it/2deaFR",
"http://bit.ly/900913",
"http://bit.ly/4m1AUx");
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i < $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
printf($rawURL."\n");
array_push($finalURLs, $rawURL);
}
I implemented to get a each line of a plain text file, with one shortened url per line, the according redirect url:
<?php
// input: textfile with one bitly shortened url per line
$plain_urls = file_get_contents('in.txt');
$bitly_urls = explode("\r\n", $plain_urls);
// output: where should we write
$w_out = fopen("out.csv", "a+") or die("Unable to open file!");
foreach($bitly_urls as $bitly_url) {
$c = curl_init($bitly_url);
curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36');
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($c, CURLOPT_HEADER, 1);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 20);
// curl_setopt($c, CURLOPT_PROXY, 'localhost:9150');
// curl_setopt($c, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
$r = curl_exec($c);
// get the redirect url:
$redirect_url = curl_getinfo($c)['redirect_url'];
// write output as csv
$out = '"'.$bitly_url.'";"'.$redirect_url.'"'."\n";
fwrite($w_out, $out);
}
fclose($w_out);
Have fun and enjoy!
pw

Categories