I want to get the data from the following page : http://kovv.mavari.be/kalender.aspx when you press on the submit button and the dropdownlists have no selected values. (So the page where you see a big table)
I've tried to follow a tutorial you can find here: http://www.mishainthecloud.com/2009/12/screen-scraping-aspnet-application-in.html.
This is what I have so far:
public function teamsoostVlaanderen()
{
$url = "http://kovv.mavari.be/kalender.aspx";
$regs=array();
$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';
// regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"(.*)\"/i';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);
$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht'
.'&ctl00$ContentPlaceHolder1$ddlReeks'
.'&ctl00_ContentPlaceHolder1_ddlDatum'
.'&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;
curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
$data = curl_exec($ch);
echo $data;
curl_close($ch);
die();
}
public function regexExtract($text, $regex, $regs, $nthValue)
{
if (preg_match($regex, $text, $regs)) {
$result = $regs[$nthValue];
}
else {
$result = "";
}
return $result;
}
But I still get the page without a post (so not with the table). When I check my cookies.txt file it's empty, maybe there's the problem? Can somebody help me find the problem?
Appropriate regex:
$regexViewstate = '/__VIEWSTATE\" value=\"([^"]*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"([^"]*)\"/i';
And missing the equal sign from your post parameters:
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht='
.'&ctl00$ContentPlaceHolder1$ddlReeks='
.'&ctl00_ContentPlaceHolder1_ddlDatum='
.'&ctl00$ContentPlaceHolder1$btnZoek=zoek'
Related
I am having a little problem with something. I have the following code which uses cURL to log into a website, posting all the required data
<?php
$url = "https://someurl/login.aspx";
$ckfile = tempnam("/tmp", "CURLCOOKIE");
$useragent = $_SERVER['HTTP_USER_AGENT'];
$username = "someuser#hotmail.co.uk";
$password = "somepassword";
$f = fopen('log.txt', 'w');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
$html = curl_exec($ch);
curl_close($ch);
preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~', $html, $viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~', $html, $eventValidation);
$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $f);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
$postfields = array();
$postfields['__EVENTTARGET'] = "";
$postfields['__EVENTARGUMENT'] = "";
$postfields['__VIEWSTATE'] = $viewstate;
$postfields['__EVENTVALIDATION'] = $eventValidation;
$postfields['btnLogin'] = "Login";
$postfields['txtPassword'] = $password;
$postfields['txtUserName'] = $username;
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch);
So the above code works fine. I do not close curl yet because I need to keep the cookie active. Anyways, once I have logged in using the above, I do
if($ret) {
curl_setopt($ch, CURLOPT_URL, 'https://someurl.com/financial/quote.aspx?id=12345');
curl_setopt($ch, CURLOPT_POST, 0);
$data = curl_exec($ch);
var_dump($data);
}
I can see from the output that I am now on the correct page. However, in the example above, I do not post anything. The page which I go too has a button on it. When looking at what is posted by this button within Firebug I see this
__EVENTARGUMENT
__EVENTTARGET
__EVENTVALIDATION fsudifhsiudgfiusgdf
__VIEWSTATE
__VIEWSTATE_GUID 0f26cc24-ef59-4bc7-87c0-141833df148b
ctl00$PageContent$btn2 Accepted
As such, I have tried to replicate the pushing of this button by doing the following
if($ret) {
curl_setopt($ch, CURLOPT_URL, 'https://someurl.com/financial/quote.aspx?id=12345');
$postfieldsInner = array();
$postfieldsInner['__EVENTTARGET'] = "";
$postfieldsInner['__EVENTARGUMENT'] = "";
$postfieldsInner['__VIEWSTATE'] = "";
$postfieldsInner['__VIEWSTATE_GUID'] = $viewstate;
$postfieldsInner['__EVENTVALIDATION'] = $eventValidation;
$postfieldsInner['ctl00$PageContent$btn2'] = "Accepted";
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfieldsInner);
$content = curl_exec($ch);
if (!$content) {
echo 'An error has occurred: ' . curl_error($ch);
} else {
var_dump($content);
}
}
This time however, the button action does not seem to occur. Am I missing something when making the second request?
Thanks
So, a couple of things jump out at me:
You're assuming that the magic values for __VIEWSTATE and __EVENTVALIDATION won't change. This is unlikely to be the case. You should pull those values again after fetching the data page.
You're passing the $viewstate variable as the value for __VIEWSTATE on the initial login, but you leave that blank on the subsequent post and instead pass $viewstate as __VIEWSTATE_GUID. Not sure if this is intentional or not.
You're using an array for CURL_POSTFIELDS which may cause problems. The documentation says:
Passing an array to CURLOPT_POSTFIELDS will encode the data as multipart/form-data, while passing a URL-encoded string will encode the data as application/x-www-form-urlencoded.
And, very important, do NOT disable certificate validation, fix your server setup instead.
A few other suggestions, perhaps more a matter of style than substance though.
Passing an empty string as CURLOPT_COOKIEFILE will enable session handling without the need to save to a file.
You don't need to do curl_close() and curl_init() multiple times in a script; just reuse the existing handle. This saves having to redefine the options and reuse the session cookies.
Use curl_setopt_array() for cleaner code.
curl_exec() returns false on error, you should check for it explicitly.
Here's how I'd clean up the code:
<?php
$url = "https://someurl/login.aspx";
$ckfile = tempnam("/tmp", "CURLCOOKIE");
$useragent = $_SERVER['HTTP_USER_AGENT'];
$username = "someuser#hotmail.co.uk";
$password = "somepassword";
$viewstate_pattern = '~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~';
$eventval_pattern = '~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~';
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_COOKIEFILE => "",
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => $useragent,
]);
// Getting the login form
$html = curl_exec($ch);
if ($html !== false) {
preg_match($viewstate_pattern, $html, $viewstate);
preg_match($evenval_pattern, $html, $eventValidation);
$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];
$postfields = http_build_query([
"__EVENTTARGET"=>"",
"__EVENTARGUMENT"=>"",
"__VIEWSTATE"=>$viewstate,
"__EVENTVALIDATION"=>$eventValidation,
"btnLogin"=>"Login",
"txtPassword"=>$password,
"txtUserName"=>$username,
]);
curl_setopt_array($ch, [
CURLOPT_REFERER=>$url,
CURLOPT_POST=>true,
CURLOPT_POSTFIELDS=>$postfields,
]);
// Submitting the login form
$html = curl_exec($ch);
if ($html !== false) {
curl_setopt_array($ch, [
CURLOPT_URL=>'https://someurl.com/financial/quote.aspx?id=12345',
CURLOPT_POST=>false,
]);
// Getting the data page
$html = curl_exec($ch);
if ($html !== false) {
preg_match($viewstate_pattern, $html, $viewstate);
preg_match($evenval_pattern, $html, $eventValidation);
$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];
$postfieldsInner = http_build_query([
"__EVENTTARGET"=>"",
"__EVENTARGUMENT"=>"",
// Should this be empty?
"__VIEWSTATE"=>"",
"__VIEWSTATE_GUID"=>$viewstate,
"__EVENTVALIDATION"=>$eventValidation,
'ctl00$PageContent$btn2'=>"Accepted",
]);
curl_setopt_array($ch, [
CURLOPT_POST=>true,
CURLOPT_POSTFIELDS=>$postfieldsInner,
]);
// Posting the data page
$html = curl_exec($ch);
if ($html === false) {
echo 'An error has occurred: ' . curl_error($ch);
} else {
var_dump($html);
}
} else {
// Error getting the data page
}
} else {
// Error submitting the login page
}
} else {
// Error getting the login page
}
curl_close();
I wanted to make an inline bot! and when i do this:
function sendResponse($url, $data){
$ch = curl_init();
//curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: multipart/form-data'));
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, array('inline_query_id' => $data['inline_query_id'], 'results' => json_encode($data['results'])));
$output = curl_exec($ch);
return $output;
}
It wont work, the error (with or without the header): {"ok":false,"error_code":400,"description":"[Error]: Bad request: Field \"message_text\" must be of type String"}
but when I do it like this:
function sendResponse($url, $data){
$ch = curl_init();
//curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: multipart/form-data'));
curl_setopt($ch, CURLOPT_URL, $url.'?inline_query_id='.rawurlencode($data['inline_query_id']).'&results='.rawurlencode(json_encode($data['results'])));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
//curl_setopt($ch, CURLOPT_POST, 1);
//curl_setopt($ch, CURLOPT_POSTFIELDS, $q);
$output = curl_exec($ch);
return $output;
}
It works ! the problem is the second method request URI will be too large so I cannot use it!
Any way I can send these data is okay with me! thanks!
and the code for making $data is here:
$result = connectWebsite(SITE_SEARCH_URL, urlencode($update['inline_query']['query']));
$result = json_decode($result);
$output = array();
$output['inline_query_id'] = $update['inline_query']['id'];
$i = 0;
foreach($result as $post){
$data = array();
$data['type'] = 'article';
$data['id'] = strval($post->ID);
$data['title'] = '('.$post->atypes.') '.$post->title;
if(strlen($post->content) > 2100)
$tmp = substr($post->content, 0, 2096).'...';
$data['message_text'] = '<b>'.$post->title.'</b>'.ucwords($post->genre, ',').$tmp;
$data['parse_mode'] = 'HTML';
if(strlen($post->content) > 200)
$tmp = substr($post->content, 0, 196).'...';
//$data['description'] = ucwords($post->genre, ',').' | '.$tmp;
$output['results'][$i] = $data;
$i++;
if($i == MAX_RESULTS)
break;
}
sendResponse(API_URL.'answerInlineQuery', $output);
It might help someone so I'll answer it myself.
the problem was the UTF-8 encoding
I replaced substr with mb_substr
besides at the first line I'v added this: mb_internal_encoding("UTF-8")
and ... the problem was solved. now I can send my inline query results (or any other command) without the URL length problem
Thanks everyone for your help
When i am Decoding using commented "$jsonString" String it is working very well.
But after using curl it is not working, showing Null.
Please Help Me in this.
if (isset($_POST['dkno'])) {
$dcktNo = $_POST['dkno'];
$url = 'http://ExampleStatus.php?dkno=' . $dcktNo;
$myvars = '';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonString = curl_exec($ch);
// $jsonString = '[{"branchname":"BHUBNESHWAR","consignee":"ICICI BANK LTD","currentstatus":"Delivered by : BHUBNESHWAR On - 25/07/2015 01:00","dlyflag":"Y","PODuploaded":"Not Uploaded"}]';
if ($jsonString != '') {
$json = str_replace(array('[', ']'), '', $jsonString);
echo $json;
$obj = json_decode($json);
if (is_null($obj)) {
die("<br/>Invalid JSON, don't need to keep on working on it");
} else {
$podStatus = $obj->PODuploaded;
}
}
}
}
After curl I used following concept to get only JSON data from HTML Page.
1) fetchData.php
$url = 'http://DocketStatusApp.aspx?dkno=' . $dcktNo;
$myvars = '';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonString = curl_exec($ch);
// now get only value
$dom = new DOMDocument();
$dom->loadHTML($jsonString);
$thediv = $dom->getElementById('Label1');
echo $thediv->textContent;
2) JSONprocess.php
if (isset($_POST['dkno'])) {
$dcktNo = $_POST['dkno'];
ob_start(); // begin collecting output
include_once 'fetchData.php';
$result = ob_get_clean(); // Completed collecting output
// Now it will show & take only JSON Data from Div Tag
$json = str_replace(array('[', ']'), '', $result);
$obj = json_decode($json);
if (is_null($obj)) {
die("<br/>Invalid JSON, don't need to keep on working on it");
} else {
$podStatus = $obj->PODuploaded;
}
}
I am trying to login Joomla 1.6/3.0 by curl in PHP but not success.
I had try method from joomla 1.5
$uname = "id";
$upswd = "pswd";
$url = "http://www.somewebpage.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url );
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt($ch, CURLOPT_COOKIEJAR, './cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, './cookie.txt');
curl_setopt($ch, CURLOPT_HEADER, FALSE );
$ret = curl_exec($ch);
if (!preg_match('/name="([a-zA-z0-9]{32})"/', $ret, $spoof)) {
preg_match("/name='([a-zA-z0-9]{32})'/", $ret, $spoof);
}
// POST fields
$postfields = array();
$postfields['username'] = urlencode($uname);
$postfields['passwd'] = urlencode($upswd);
$postfields['lang'] = '';
$postfields['option'] = 'com_login';
$postfields['task'] = 'login';
$postfields[$spoof[1]] = '1';
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch);
But it 's show forbidden
Joomla! normally from the web browser will send a token, to make sure the form was posted from a browser (and not from somewhere outside, like you are trying to do).
If it's not posted from a browser, an error like 'The most recent request was denied because it contained an invalid security token. Please refresh the page and try again.'
You may want to look for an autentication plugin that supports what you are trying to do. For example the Autologin plugin.
This is my cURL POST function:
public function curlPost($url, $data)
{
$fields = '';
foreach($data as $key => $value) {
$fields .= $key . '=' . $value . '&';
}
rtrim($fields, '&');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, count($data));
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
}
$this->curlPost('remoteServer', array(data));
How do I read the POST on the remote server?
The remote server is using PHP... but what var in $_POST[] should I read
for e.g:- $_POST['fields'] or $_POST['result']
You code works but i'll advice you to add 2 other things
A. CURLOPT_FOLLOWLOCATION because of HTTP 302
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
B. return in case you need to output the result
return $result ;
Example
function curlPost($url, $data) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$result = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
return $result;
}
print(curlPost("http://yahoo.com", array()));
Another Example
print(curlPost("http://your_SITE", array("greeting"=>"Hello World")));
To read your post you can use
print($_REQUEST['greeting']);
or
print($_POST['greeting']);
as a normal POST request ... all data posted can be found in $_POST ... except files of course :) add an &action=request1 for example to URL
if ($_GET['action'] == 'request1') {
print_r ($_POST);
}
EDIT: To see the POST vars use the folowing in your POST handler file
if ($_GET['action'] == 'request1') {
ob_start();
print_r($_POST);
$contents = ob_get_contents();
ob_end_clean();
error_log($contents, 3, 'log.txt' );
}