When I try to get content using CURL but many time does not return any value..
Note : I am tested on same URL
CURL CODE
function curltest($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
$agents = array(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100508 SeaMonkey/2.0.4',
'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)',
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; da-dk) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1'
);
curl_setopt($ch,CURLOPT_USERAGENT,$agents[array_rand($agents)]);
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 10";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); // ADD THIS
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch,CURLOPT_ENCODING, '');
$output = curl_exec($ch);
if(curl_errno($ch)){
curl_close($ch);
$output = file_get_contents($url);
}
return $output;
}
Please give any suggestion OR tell me if any changes required in above code
see here to get all options in curl php curl options. Different sites using different type of client validation. If your curl fails to valid there site, it may return nothing.
Related
I need help fixing my script so that visitors through the script will show up in analytics as coming from tumblr.com.
When I visit my php page, it shows http://prntscr.com/aczmvi
Heres my script---->
<?php
print curl_spoof("http://whatsmyreferer.com/");
function curl_spoof($url)
{
$curl = curl_init();
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
curl_setopt($curl, CURLOPT_URL, $url);
//Spoof the agent
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36');
//Spoof the Referer
curl_setopt($curl, CURLOPT_REFERER, "http://tumblr.com");
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
if (!$html = curl_exec($curl))
{
$html = file_get_contents($url);
}
curl_close($curl);
return $html;
}
?>
Ideas?
I am trying to make a website scraper, but the website is acting diferrently, than normal request via browser.
How can i make perfect cURL reguest, that the website will not filter it and block it?
Any help would be appriciated.
$curl_handle = curl_init ("***");
$header = array();
$header[] = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0";
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[] = "Accept-Language: cs,en-US;q=0.7,en;q=0.3";
$header[] = "Accept-Encoding: utf-8";
$header[] = "Connection: keep-alive";
$header[] = "Host: ****";
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, $header);
curl_setopt ($curl_handle, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
curl_setopt ($curl_handle, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
curl_setopt ($curl_handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($curl_handle, CURLOPT_AUTOREFERER, true);
$output = curl_exec ($curl_handle);
This is, what i got so far, but it is still getting blocked.
The following CURL options might help:
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
I want to fetch the result of a webpage, exactly like a simple user, through a browser. I setted the headers and the sended cookies of the request, to which I got with fiddler4. Last night it's worked, but now it's throw cURL error 28, request timed out.
Here is the code what I used:
function cURL($url){
$cURL=curl_init();
$header[0] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: hu-HU,hu;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2,it;q=0.2,de;q=0.2,fr;q=0.2";
$header[] = "Accept-Encoding: gzip, deflate, sdch";
$header[] = "Pragma: ";
curl_setopt($cURL, CURLOPT_URL, $url);
curl_setopt($cURL, CURLOPT_USERAGENT, 'User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36');
curl_setopt($cURL, CURLOPT_HTTPHEADER, $header);
curl_setopt($cURL, CURLOPT_POST, true);
curl_setopt($cURL, CURLOPT_POSTFIELDS, 'action=verChau');
curl_setopt($cURL, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($cURL, CURLOPT_AUTOREFERER, true);
curl_setopt($cURL, CURLOPT_COOKIE, 'Cookie: __gfp_64b=mgK79a4qc_M9RH4eFToSbGkkxUWaWD2tKPQ51RreN8r.A7; PHPSESSID=780d83cb35c5b82098e33fde9c101d08; __atuvc=0%7C4%2C0%7C5%2C1%7C6%2C6%7C7%2C1%7C8; cTest=1; resDone20101213=1; _gat=1; _ga=GA1.2.307256553.1418233339; _goa3=eyJ1IjoiMTQxMjEyNDEwODM2NDE4MTU4NzAxNiIsImgiOiJCQzI0OEVFRC5kc2wucG9vbC50ZWxla29tLmh1IiwicyI6MTQxODMzODgwMDAwMH0=; _goa3TC=eyI1NjM2NCI6MTQyNTEzMjMwNDE2OCwiMzEzNDUzMiI6MTQyNTE0ODk1NDg2M30=; _goa3TS=e30=');
curl_setopt($cURL, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cURL, CURLOPT_TIMEOUT, 10);
$html=curl_exec($cURL);
if ($html === false){
echo "cURL exception: ".curl_errno($cURL).": ".curl_error($cURL);
}
curl_close($cURL);
return $html;
}
Can anybody help me out?
function get_data($url,$proxy=Null){
$agents = array(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100508 SeaMonkey/2.0.4',
'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)',
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; da-dk) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1'
);
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl,CURLOPT_USERAGENT,$agents[array_rand($agents)]);
curl_setopt($curl, CURLOPT_REFERER, "http://google.com/");
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE); ///** Follow Redirect
$html1 = curl_exec($curl);
curl_close($curl);
return $html1;
}
Above is my function and i am trying to get a page from proxy site
echo get_data('http://www.hostfast.info/browse.php?u=lZpnCp2dHRM0%2BnBp1Ljfmr8I%2BA%3D%3D&b=5');
But this is not working ....its giving me home page of that site and if i am trying new search its also not working... i am new to CURL ... but i think there is some thing to do with cookies ... how can i fix this
thx
To save cookie in cURL with PHP:
curl_setopt($curl, CURLOPT_COOKIEFILE, "yourcookiefile.txt");
curl_setopt($curl, CURLOPT_COOKIEJAR, "yourcookiefile.txt");
define('POSTURL', 'http://hostfast.info/includes/process.php?action=update');
define('POSTVARS', 'u=google.com/complete/search?output=toolbar&q=love'); // POST VARIABLES TO BE SENT
$ch = curl_init(POSTURL);
curl_setopt($ch, CURLOPT_POST ,1);
curl_setopt($ch, CURLOPT_POSTFIELDS ,POSTVARS);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,1);
curl_setopt($ch, CURLOPT_HEADER ,0); // DO NOT RETURN HTTP HEADERS
curl_setopt($ch, CURLOPT_RETURNTRANSFER ,1); // RETURN THE CONTENTS OF THE CALL
curl_setopt($ch, CURLOPT_COOKIEFILE, "yourcookiefile.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "yourcookiefile.txt");
$Rec_Data = curl_exec($ch);
curl_close($ch);
echo $Rec_Data;
This works .. ;)
I am using cURL to access a facebook page. Locally it works perfect, but when I upload it to my dev server, it breaks and returns an empty string. I've checked and cURL is installed on the server. Here's the code I use to access facebook:
$header = array();
$header[] = 'Accept: text/json';
$header[] = 'Cache-Control: max-age=0';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';
$header[] = 'Pragma: ';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://facebook.com/feeds/page.php?format=json&id=135137236003');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close ($ch);
Any help is appreciated!
Change the accept header to */* or application/json as facebook is sending the response header as application/json.
And change this url
http://facebook.com/feeds/page.php?format=json&id=135137236003
to
http://www.facebook.com/feeds/page.php?format=json&id=135137236003
as facebook is redirecting the non-www request to www requests. Though it works for you as put follow location, but it saves one reound trip