The below function returns different result than that actually we render the page directly.
What would be the issue?
function file_get_contents_curl($url) {
$ch = curl_init();
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1)Gecko/20061204 Firefox/2.0.0.1";
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt ($ch, CURLOPT_TIMEOUT, 10);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
My guess is that your accessing it with one browser, but your setting the $useragent to another. The external site might be returning different data depending on the useragent
Related
I am trying to write a script to login to a particular site and obtain multiple web pages in the site to php variables. Here I am using the same cookie in both curl requests.
In the attached code, first curl request returns the home page, but the second requests return the login page instead of the requested page(it seems that it is counted as a new request). I am using the same cookie file in both occasions.
<?php
$username='xxxxx';
$password='xxxxxxx';
$cookie="C:/Games/cookie21.txt";
$url = 'http://xxxx/xxx/login.php';
$postdata = "?&uname=$username&upwd=$password";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
echo $result;
curl_close($ch);
$url1 = 'http:///xxxx/xxx/forms/personal_info.php';
$ch1 = curl_init();
curl_setopt ($ch1, CURLOPT_URL, $url);
curl_setopt ($ch1, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch1, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch1, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch1, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch1, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch1, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch1, CURLOPT_REFERER, $url1);
$result1 = curl_exec ($ch1);
echo $result1;
curl_close($ch1);
?>
Can someone explain the reason for this odd behavior and the modification needed in this code to obtain multiple web pages in a site to php variables using php-curl?
I've been playing with this curl facebook login script for a while just trying to get to grips with some of the features in curl, but it seems that I can not get the cookies to register:
php script
function facebookLogin(){
$login_email = 'email';
$login_pass = 'pass';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.facebook.com/login.php');
curl_setopt($ch, CURLOPT_POSTFIELDS,'email='.urlencode($login_email).'&pass='.urlencode($login_pass).'&login=Login');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
$page = curl_exec($ch);
echo $page;
}
I have a text file called cookies.txt which is in the same directory as the script, but after running this script nothing is written into the file and therefore no cookies are created, this is a big issue when trying to explore other web pages on the same website as you have to keep logging in.
Where am I going wrong?
Ok it turns out it is registered even if the cookies.txt file is empty but you need to make sure you call this file when you try to explore other parts of the site e.g.
function facebookGoToMessages(){
facebookLogin();
$ch = curl_init ("http://www.facebook.com/messages");
curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
$page = curl_exec ($ch);
echo $page;
}
When I add an parameter to the CURL init function:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
It does print the site content I'm sending a request to, the problem is that I want to manually parse the returned content ($response = curl_exec($ch);) but the problem is that, the site is displaying the page content and I want to keep having the site content on my $response variable so I could parse it, but at the same, stop it from displaying it.
Is that feasible?
The curl code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $action);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, '');
curl_setopt($ch, CURLOPT_COOKIEFILE, '');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
$response = curl_exec($ch);
curl_close($ch);
You should set the CURLOPT_RETURNTRANSFER flag to true. From PHP.NET manual
CURLOPT_RETURNTRANSFER TRUE
To return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
I'm trying to login to a secure aspx site using curl, and retrieve some of the account's data.
The page uses the aspx __VIEWSTATE to keep track of the browser's state. From checking the request headers here is the sequence:
user GETS from Login.aspx (including __VIEWSTATE)
user POSTS __VIEWSTATE, loginName and loginPassword to login.aspx -> server responds with 302
user GETS Submissions.aspx
submissions.aspx is a table of different clients referred to by __EVENTTARGET=dgrdSubmissions$ctl0x$ctl00 where the first $ctl0x represents that client's row.
user POSTS _VIEWSTATE,_EVENTTARGET and an AdvisorView param to submissions.aspx -> server responds with 302
user GETS Policy.aspx
This works fine in the browser (Chrome - The site suspiciously breaks in Firefox with Message: Exception of type 'System.Web.HttpUnhandledException' was thrown) but in my php script the GET Policy.aspx responds with the login page and not the expected client info.
Here is my code (minus error-checking and page displaying):
Helper Functions:
function curl_page($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);
curl_close($ch);
return $data;
}
function curl_ssl_page($url="",$postdata=""){
$ch = curl_init();
$cookie = 'cookie.txt';
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
return $result;
}
function curl_get_page($url=""){
$ch = curl_init();
$cookie = 'cookie.txt';
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
$result = curl_exec ($ch);
return $result;
}
Pages
Pages - Login:
if(isset($_POST['user-name'])) {
//GET login page
$url = "http://www.gryphinonline.ca/Login.aspx";
$login_page = $this->curl_page($url);
// get viewstate
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"(.*)\"/i';
$viewstate = $this->regexExtract($login_page,$regexViewstate,1);
$eventval = $this->regexExtract($login_page, $regexEventVal,1);
//Post to login page
$postdata = '__VIEWSTATE='.rawurlencode($viewstate)
.'&txtLoginName='.$_POST['user-name']
.'&txtPassword='.$_POST['password']
.'&Start=Login+%2F+Ouverture+de+session';
$this->curl_ssl_page($url,$postdata);
header("Location:http://url-edited/submissions");
}
Pages - Submissions:
$url = "http://www.gryphinonline.ca/Submissions.aspx";
$submissions = $this->curl_get_page($url);
$dom = new DOMDocument();
#$dom->loadHTML($submissions);
// scrape for data including viewstate
$view = $dom->getElementById('dgrdSubmissions');
if(!$view) header("Location://url-edited/login");
$h_data = $dom->getElementsByTagName('div');
$h_data = $h_data->item(0);
if(isset($_POST['__EVENTTARGET'])){
$postdata=array();
foreach ($_POST as $key => $value) {
$postdata[]=$key.'='.$value;
}
$postdata = implode('&', $postdata);
$this->curl_ssl_page($url,$postdata);
header("Location:http://url-edited/policy");
}
Pages - Policy:
$url = "http://www.gryphinonline.ca/Policy.aspx";
$policy = $this->curl_get_page($url);
All the HTTP requests and cookies are identical as far as I can tell. Anyone have any idea what is going on here? Is this possibly related to the site's problems with Firefox or am I misunderstanding something basic?
I've been at this for a few days and any help would be appreciated.
Turns out I had forgotten to urlencode the POST string to submissions.
Any idea where my code is going wrong...I am trying to connect through a proxy with the curl function in php...I assuming the proxy worked bc I tried a few from this list http://hidemyass.com/proxy-list/search-234921 but cant seem to get any to function correctly...
Thoughts?
function my_fetch($url,$user_agent='Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)')
{
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, '75.74.244.122:1523');
$data = curl_exec();
curl_close($ch);
return $result;
}
It doesn't look like the proxy you are using is working:
jasonfunk#jasonfunk-laptop:$ telnet 75.74.244.122 1523
Trying 75.74.244.122...
telnet: Unable to connect to remote host: Connection refused
You can try multiple proxy by using random one by one using this script
Get random proxy
function get_random_proxy(){
srand ((double)microtime()*1000000);
$f_contents = file ("proxy.txt");
$line = $f_contents[array_rand ($f_contents)];
return $line;
}
call curl function using one proxy randomly
function get_curl_proxy($url){
$proxy_ip = get_random_proxy();
$agent = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.4 (KHTML, like Gecko) Chrome/4.0.233.0 Safari/532.4";
$referer = "http://www.google.com/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxy_ip);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
For further reference see this
http://altafphp.blogspot.in/2012/06/using-proxies-with-curl-in-php.html