cURL web scraping from VBulletin - php

Hi need some help i am new to cURL,i am trying to scrap data from http://www.ugbettingforum.co.uk,
so i can send any changes in forum to my e-mail.
But can't pass log-in. Forum is VBulletin and i manage to get massage "thanks for log in username" but after redirect i am not loged in.
here is my code:
function vBulletinLogin($user, $pass){
$md5Pass = md5($pass);
$data = "do=login&url=%2Findex.php&vb_login_md5password=$md5Pass&vb_login_username=$user&cookieuser=1";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, "http://www.ugbettingforum.co.uk/login.php?do=login"); // replace ** with tt
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0");
curl_setopt ($ch, CURLOPT_TIMEOUT, '10');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_COOKIEJAR, "/cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "/cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$store = curl_exec ($ch);
echo $store;
curl_close($ch);}

Related

Why can't I access two different web pages using same cookie in this code [PHP-CURL]?

I am trying to write a script to login to a particular site and obtain multiple web pages in the site to php variables. Here I am using the same cookie in both curl requests.
In the attached code, first curl request returns the home page, but the second requests return the login page instead of the requested page(it seems that it is counted as a new request). I am using the same cookie file in both occasions.
<?php
$username='xxxxx';
$password='xxxxxxx';
$cookie="C:/Games/cookie21.txt";
$url = 'http://xxxx/xxx/login.php';
$postdata = "?&uname=$username&upwd=$password";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
echo $result;
curl_close($ch);
$url1 = 'http:///xxxx/xxx/forms/personal_info.php';
$ch1 = curl_init();
curl_setopt ($ch1, CURLOPT_URL, $url);
curl_setopt ($ch1, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch1, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch1, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch1, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch1, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch1, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch1, CURLOPT_REFERER, $url1);
$result1 = curl_exec ($ch1);
echo $result1;
curl_close($ch1);
?>
Can someone explain the reason for this odd behavior and the modification needed in this code to obtain multiple web pages in a site to php variables using php-curl?

PHP + cURL: Error: The state information is invalid for this page and might be corrupted.

I try to login to
https://reg.viatoll.pl/PreLogin.aspx?language=po
with CURL. On error I should get "Error verification code" - in Polish: "Nieprawidlowy kod weryfikacyjny" but I get
Error: The state information is invalid for this page and might be corrupted. System.Web at System.Web.UI.ViewStateException.ThrowError(Exception inner, String persistedState, String errorPageMessage, Boolean macValidationError) at...
I don't know what I do wrong. I notify that this is sth with __VIEWSTATE and __EVENTVALIDATION. If I open this page in Chrome and change sth in __VIEWSTATE or __EVENTVALIDATION before post data I get the same error.
My code below:
<?php
require_once('simple_html_dom.php'); // http://simplehtmldom.sourceforge.net/manual.htm
$cookie="cookie2.txt";
$url = 'https://reg.viatoll.pl/PreLogin.aspx?language=po';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIE, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt ($ch, CURLOPT_REFERER, $url);
$result = curl_exec($ch);
$html = str_get_html($result);
$vs = $html->find('#__VIEWSTATE')[0]->value;
$ev = $html->find('#__EVENTVALIDATION')[0]->value;
// sending data
$txtVerificationCode='asdasdas';
$txtUserName="UserName";
$txtPassword="Haselko123";
$url='https://reg.viatoll.pl/PreLogin.aspx?language=po';
$postdata = "__LASTFOCUS=&__EVENTTARGET=btnLogin&__EVENTARGUMENT=&__VIEWSTATE=".$vs."&__EVENTVALIDATION=".$ev."&txtUserName=".$txtUserName."&txtPassword=".$txtPassword."&txtVerificationCode=".$txtVerificationCode;
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie); // <-- add this line
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
echo "POST: " . $postdata . "<br><br><br>";
echo $result;
?>
Please anybody help :)

PHP/cURL how to get Request Cookies

Hy,
I have a bit of a problem here. One client of mine asked to create a login script for a website built in ASP. As much as I can take and use Response Cookies, I am not able to take and use Request Cookies. I tried, but they are not saved in my cookie files. As such, I cannot login into the very website. Its address is http://www.itraceuk.co.uk/Default.asp?secure=signin
Here is the code I am currently using:
$ch4 = curl_init();
curl_setopt ($ch4, CURLOPT_URL, $posturl);
curl_setopt($ch4, CURLOPT_HEADER, true);
curl_setopt ($ch4, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch4, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt ($ch4, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36");
curl_setopt ($ch4, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch4, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch4, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch4, CURLOPT_COOKIESESSION, 1);
curl_setopt ($ch4, CURLOPT_COOKIEJAR, $cookie2);
curl_setopt ($ch4, CURLOPT_COOKIEFILE, $cookie2);
curl_setopt ($ch4, CURLOPT_REFERER, 'http://www.itraceuk.co.uk/Default.asp?secure=signin');
$result4 = curl_exec ($ch4);
curl_close($ch4);
<?php
$posturl = 'http://www.itraceuk.co.uk/Default.asp?secure=signin';
$cookie = __DIR__ . '/cookie.txt'; // file and folder must be writeable
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $posturl);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)");
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.itraceuk.co.uk/Default.asp?secure=signin');
$result = curl_exec($ch);
curl_close($ch);
var_dump($result);
var_dump(file_get_contents($cookie));
Set cookie folder/file permissions to writeable by PHP and Server.
Reuse the cookie data after the login.
Curl Post to the login form with correct credentials, that logs you in and sets the cookie (curl CURLOPT_POSTFIELDS). Then reuse this cookie for the second curl request (this code here).
For a complete example of a HTTPS cURL login, please see: https://stackoverflow.com/a/10307956/1163786

php cur login and display result

I have the following code to login to a remote website.
After executing the script i get the following error from the website : Technical Error
This is the result that i get from executing curl.
$username = $_POST['search']; $password = $_POST['pasword']; $ch = curl_init(); $postdata="&search=".$username."&password=".$password; curl_setopt ($ch, CURLOPT_URL,"https://payment.schibsted.no/login?client_id=5087dc1b421c7a0b79000000&response_type=code&redirect_uri=https%3A%2F%2Fwww.finn.no%2Ffinn%2FloginCallback%3FredirectKey%3D977170356717"); curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_HEADER, true); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); curl_setopt ($ch, CURLOPT_COOKIEFILE, 'cookie.txt'); curl_setopt ($ch, CURLOPT_REFERER, "https://payment.schibsted.no/login?client_id=5087dc1b421c7a0b79000000&response_type=code&redirect_uri=https%3A%2F%2Fwww.finn.no%2Ffinn%2FloginCallback%3FredirectKey%3D977170356717"); curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt ($ch, CURLOPT_POST, 1); $result = curl_exec($ch); curl_setopt($ch, CURLOPT_URL, "menu link") ; $result2 = curl_exec($ch) ; echo $result2 ; curl_close($ch);
I cleaned up your code what had no line breaks and made it more understandable to some level.
//Variables
$username = $_POST['search'];
$password = $_POST['pasword'];
$cookie = $username.'.txt';
$postdata = "search=${username}&password=${password}";
$agent = 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)';
//*not sure if all this is needed and I don't know why you set it as the referer*
$url = 'https://payment.schibsted.no/login?client_id=5087dc1b421c7a0b79000000&response_type=code&redirect_uri=https%3A%2F%2Fwww.finn.no%2Ffinn%2FloginCallback%3FredirectKey%3D977170356717';
// ---
//Curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($ch);
curl_close($ch);
// ---
echo $result;
//*Not sure what this is doing I found it at the bottom of your code*
//curl_setopt($ch, CURLOPT_URL, "menu link");
//$result2 = curl_exec($ch) ;
//echo $result2;
I don't think all this is necessary, also could you provide a little more information?

Trouble getting .aspx page using php curl

I'm trying to login to a secure aspx site using curl, and retrieve some of the account's data.
The page uses the aspx __VIEWSTATE to keep track of the browser's state. From checking the request headers here is the sequence:
user GETS from Login.aspx (including __VIEWSTATE)
user POSTS __VIEWSTATE, loginName and loginPassword to login.aspx -> server responds with 302
user GETS Submissions.aspx
submissions.aspx is a table of different clients referred to by __EVENTTARGET=dgrdSubmissions$ctl0x$ctl00 where the first $ctl0x represents that client's row.
user POSTS _VIEWSTATE,_EVENTTARGET and an AdvisorView param to submissions.aspx -> server responds with 302
user GETS Policy.aspx
This works fine in the browser (Chrome - The site suspiciously breaks in Firefox with Message: Exception of type 'System.Web.HttpUnhandledException' was thrown) but in my php script the GET Policy.aspx responds with the login page and not the expected client info.
Here is my code (minus error-checking and page displaying):
Helper Functions:
function curl_page($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);
curl_close($ch);
return $data;
}
function curl_ssl_page($url="",$postdata=""){
$ch = curl_init();
$cookie = 'cookie.txt';
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
return $result;
}
function curl_get_page($url=""){
$ch = curl_init();
$cookie = 'cookie.txt';
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
$result = curl_exec ($ch);
return $result;
}
Pages
Pages - Login:
if(isset($_POST['user-name'])) {
//GET login page
$url = "http://www.gryphinonline.ca/Login.aspx";
$login_page = $this->curl_page($url);
// get viewstate
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"(.*)\"/i';
$viewstate = $this->regexExtract($login_page,$regexViewstate,1);
$eventval = $this->regexExtract($login_page, $regexEventVal,1);
//Post to login page
$postdata = '__VIEWSTATE='.rawurlencode($viewstate)
.'&txtLoginName='.$_POST['user-name']
.'&txtPassword='.$_POST['password']
.'&Start=Login+%2F+Ouverture+de+session';
$this->curl_ssl_page($url,$postdata);
header("Location:http://url-edited/submissions");
}
Pages - Submissions:
$url = "http://www.gryphinonline.ca/Submissions.aspx";
$submissions = $this->curl_get_page($url);
$dom = new DOMDocument();
#$dom->loadHTML($submissions);
// scrape for data including viewstate
$view = $dom->getElementById('dgrdSubmissions');
if(!$view) header("Location://url-edited/login");
$h_data = $dom->getElementsByTagName('div');
$h_data = $h_data->item(0);
if(isset($_POST['__EVENTTARGET'])){
$postdata=array();
foreach ($_POST as $key => $value) {
$postdata[]=$key.'='.$value;
}
$postdata = implode('&', $postdata);
$this->curl_ssl_page($url,$postdata);
header("Location:http://url-edited/policy");
}
Pages - Policy:
$url = "http://www.gryphinonline.ca/Policy.aspx";
$policy = $this->curl_get_page($url);
All the HTTP requests and cookies are identical as far as I can tell. Anyone have any idea what is going on here? Is this possibly related to the site's problems with Firefox or am I misunderstanding something basic?
I've been at this for a few days and any help would be appreciated.
Turns out I had forgotten to urlencode the POST string to submissions.

Categories