PHP cURL - how to emulate exactly same request like user? - php

I am trying to make a website scraper, but the website is acting diferrently, than normal request via browser.
How can i make perfect cURL reguest, that the website will not filter it and block it?
Any help would be appriciated.
$curl_handle = curl_init ("***");
$header = array();
$header[] = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0";
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[] = "Accept-Language: cs,en-US;q=0.7,en;q=0.3";
$header[] = "Accept-Encoding: utf-8";
$header[] = "Connection: keep-alive";
$header[] = "Host: ****";
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, $header);
curl_setopt ($curl_handle, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
curl_setopt ($curl_handle, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
curl_setopt ($curl_handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($curl_handle, CURLOPT_AUTOREFERER, true);
$output = curl_exec ($curl_handle);
This is, what i got so far, but it is still getting blocked.

The following CURL options might help:
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);

Related

Use cookie with cURL PHP

I need to get the source code of a page that is only accessible when I am logged in. However, when I use the following code, it doesn't recognize that I'm logged in, and asks me to re-login.
$opts = array('http' => array('header'=> 'Cookie: ' . $COOKIEFILE."\r\n"));
$context = stream_context_create($opts);
$file = file_get_contents('http://www.example.com/', false, $context);
echo $file;
I have researched and found examples similar to mine, but they don't help me. When I use the above code it doesn't recognize my cookies.
Full code:
$USERNAME = 'myEmail';
$PASSWORD = 'myPassword';
$COOKIEFILE = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $COOKIEFILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_URL,
'https://accounts.google.com/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts/manage');
$data = curl_exec($ch);
$formFields = getFormFields($data);
$formFields['Email'] = $USERNAME;
$formFields['Passwd'] = $PASSWORD;
unset($formFields['PersistentCookie']);
$post_string = '';
foreach($formFields as $key => $value) {
$post_string .= $key . '=' . urlencode($value) . '&';
}
$post_string = substr($post_string, 0, -1);
curl_setopt($ch, CURLOPT_URL, 'https://accounts.google.com/ServiceLoginAuth');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
$result = curl_exec($ch);
$ch = curl_init ("http:/example.com");
curl_setopt ($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec ($ch);
var_dump($output);
Allen, in situations like yours, when I'm having trouble login with curl, what I normally do is use Live Http Headers for firefox to log my requests and then use the request headers with curl CURLOPT_HTTPHEADER, the cookies will be included in the headers.
Flow:
Log in to the website using your browser, once logged, log any request with Live Http Headers for firefox, copy and paste the request to an array, something like :
$myHeaders = array(
"Host: stackoverflow.com",
"User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0",
"Accept: application/json, text/javascript, */*; q=0.01",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
"DNT: 1",
"Content-Type: application/x-www-form-urlencoded; charset=UTF-8",
"X-Requested-With: XMLHttpRequest",
"Referer: http://stackoverflow.com/questions/32933636/use-cookie-with-curl-php",
"Content-Length: 482",
"Cookie: mycookie :)",
"Connection: keep-alive",
"Pragma: no-cache"
);
Then, use the headers with curl:
curl_setopt($ch, CURLOPT_HTTPHEADER, $myHeaders );
Notes:
1- No need for CURLOPT_COOKIEJAR or CURLOPT_COOKIEFILE
2- This works with almost every site I've tried.
3- Some cookies may expire after a while, others, like google, never expire.

Php CURL to work with cookies and session

function get_data($url,$proxy=Null){
$agents = array(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100508 SeaMonkey/2.0.4',
'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)',
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; da-dk) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1'
);
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl,CURLOPT_USERAGENT,$agents[array_rand($agents)]);
curl_setopt($curl, CURLOPT_REFERER, "http://google.com/");
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE); ///** Follow Redirect
$html1 = curl_exec($curl);
curl_close($curl);
return $html1;
}
Above is my function and i am trying to get a page from proxy site
echo get_data('http://www.hostfast.info/browse.php?u=lZpnCp2dHRM0%2BnBp1Ljfmr8I%2BA%3D%3D&b=5');
But this is not working ....its giving me home page of that site and if i am trying new search its also not working... i am new to CURL ... but i think there is some thing to do with cookies ... how can i fix this
thx
To save cookie in cURL with PHP:
curl_setopt($curl, CURLOPT_COOKIEFILE, "yourcookiefile.txt");
curl_setopt($curl, CURLOPT_COOKIEJAR, "yourcookiefile.txt");
define('POSTURL', 'http://hostfast.info/includes/process.php?action=update');
define('POSTVARS', 'u=google.com/complete/search?output=toolbar&q=love'); // POST VARIABLES TO BE SENT
$ch = curl_init(POSTURL);
curl_setopt($ch, CURLOPT_POST ,1);
curl_setopt($ch, CURLOPT_POSTFIELDS ,POSTVARS);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,1);
curl_setopt($ch, CURLOPT_HEADER ,0); // DO NOT RETURN HTTP HEADERS
curl_setopt($ch, CURLOPT_RETURNTRANSFER ,1); // RETURN THE CONTENTS OF THE CALL
curl_setopt($ch, CURLOPT_COOKIEFILE, "yourcookiefile.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "yourcookiefile.txt");
$Rec_Data = curl_exec($ch);
curl_close($ch);
echo $Rec_Data;
This works .. ;)

Unable to download URL using Curl

I'm trying to download a URL : http://es.extpdf.com/nagore-pdf.html using the following code. But I'm getting statuscode as 0 in return. But when accessing it from : http://web-sniffer.net/ it shows 301 redirected. My code seems to be working fine for 301 redirected URLs too.
What could be the problem?
<?php
print disavow_download_url("http://es.extpdf.com/nagore-pdf.html");
function disavow_download_url($url) {
$custom_headers = array();
$custom_headers[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$custom_headers[] = "Pragma: no-cache";
$custom_headers[] = "Cache-Control: no-cache";
$custom_headers[] = "Accept-Language: en-us;q=0.7,en;q=0.3";
$custom_headers[] = "Accept-Charset: utf-8,windows-1251;q=0.7,*;q=0.7";
$ch = curl_init();
$useragent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1";
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
//curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);
//these two from https
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 10); //timeout in seconds
$txResult = curl_exec($ch);
$statuscode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
print "statuscode=$statuscode\n";
print "result=$txResult\n";
}
The url is accessible from USA, not from your region. It worked for the web-sniffer because their server is hosted at USA(or somewhere which region is allowed by the extpdf).
I have used an USA proxy with the curl and it returned me data.
curl_setopt($ch, CURLOPT_PROXY, "100.9.90.1:3128"); // change IP, Port

CURL set date via php

i have been searching around for awhile without any luck.
im wondering if it is possible to set the date of a server with curl()?
i currently got this code to login and retrive data
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $loginURL);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookieFile);
curl_setopt ($ch, CURLOPT_REFERER, $loginURL);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt ($ch, CURLOPT_POST, 1);
But sometimes i the server im getting data from, to think that it is another date.
i know i can mane another curl "partion" to get data from at specifik date by the url, but i figure it is faster to only call the remote server once, so if it is possible to set a header or something?
I specific wonna do this: trick the server, that i call via CURL to think that it is to days in the future
You should have a detailed look at the:
PHP.net setopt Manual
With curl, you can set a full header with curl_setopt($handle, CURLOPT_HTTPHEADER, $header).
I don't entirely understand what you are attempting to in your specific case but hopefully this example might be useful:
$agent = 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)';
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_USERAGENT, $agent);//s[array_rand($agents)]);
curl_setopt ($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
$fileContents = curl_exec($ch);
curl_close($ch);
Further, if you are working with multiple curl requests you might consider a great library written by Josh Frasier, called Rolling_Curl.

PHP curl cannot read a page

PHP curl at our localhost works fine but not on any other server
Here is the code
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:13.0) Gecko/20100101 Firefox/13.0.1" );
curl_setopt($ch, CURLOPT_URL, 'http://50.7.243.50:8054/played.html');
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER ,0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch,CURLOPT_TIMEOUT,120);
curl_setopt($ch,CURLOPT_MAXREDIRS,20);
$data = curl_exec ($ch);
if ($data == false)
echo "CURL ERROR : ".curl_error($ch)."<br>";
curl_close ($ch);
even if we do this
curl_setopt($ch, CURLOPT_URL, 'http://50.7.243.50/played.html');
curl_setopt($ch, CURLOPT_PORT, 8054);
it shows error couldn't connect to host
Any help is appreciated....
try this code, i have tested and working.
$html = "http://50.7.243.50:8054/played.html";
$header = array();
$header[] = 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Cache-Control: max-age=0';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';
$header[] = 'Pragma: ';
$ch = curl_init();
$proxy_ip = '122.72.112.148';
$proxy_port = 80;
curl_setopt($ch, CURLOPT_PROXYPORT, $proxy_port);
curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
curl_setopt($ch, CURLOPT_PROXY, $proxy_ip);
curl_setopt($ch, CURLOPT_URL, $html);
curl_setopt ($ch, CURLOPT_PORT , 8054);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close ($ch);
echo $result;
Try below code :
$html = "http://50.7.243.50:8054/played.html";
$header = array();
$header[] = 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Cache-Control: max-age=0';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';
$header[] = 'Pragma: ';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $html);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close ($ch);
echo $result;

Categories