I have a code that combines CURL and DOM. My code:
<?php
// Create temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
// URL to login page
$url = "https://www.investagrams.com/login";
// Get Login page and its cookies and save cookies in the temp file
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
#$output = curl_exec($ch);
$fields = array(
'ctl00$WelcomePageMainContent$ctl00$Username' => '********',
'ctl00$WelcomePageMainContent$ctl00$Password' => '********',
);
$fields_string = '';
foreach($fields as $key=>$value) {
$fields_string .= $key . '=' . $value . '&';
}
rtrim($fields_string, '&');
// Post login form and follow redirects
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, count($fields));
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
#$output = curl_exec($ch);
$url = "https://www.investagrams.com/Stock/RealTimeMonitoring";
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
#echo $output;
$dom = new DomDocument;
$dom->loadHtmlFile($output);
$xpath = new DomXPath($dom);
// collect header names
$headerNames = array();
foreach ($xpath->query('//table[#id="StockQuoteTable"]//th') as $node) {
$headerNames[] = $node->nodeValue;
}
// collect data
$data = array();
foreach ($xpath->query('//tbody[#id="StockQuoteTable:tbody_element"]/tr') as $node) {
$rowData = array();
foreach ($xpath->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
$data[] = array_combine($headerNames, $rowData);
}
print_r($data);
?>
This loads to just "Arrays():"
Here's the info of table I want to extract:
I don't know which part is wrong. The Curl part is 100% working, the error is in DOM part. Thank you
<div class="dataTables_scrollBody" style="overflow: auto; height: 300px; width: 100%;">
<table id="StockQuoteTable" class="table dataTable no-footer" role="grid" aria-describedby="StockQuoteTable_info" style="width: 1166px;">
<thead></thead>
<tbody>
<tr id="num1" class="odd" role="row"
I was able to find out part of the problem with your code, however it seems that the HTML code supplied from the curl request seems to have some errors in it preventing the function DOMXPath::query from returning a valid match.
The problem I was able to fix in your code was caused by you using DOMDocument::loadHTMLfile instead of DOMDocument::loadHTML to include the HTML retrieved from your curl request. So the valid script should be:
<?php
// Create temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
// URL to login page
$url = "https://www.investagrams.com/login";
// Get Login page and its cookies and save cookies in the temp file
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
#$output = curl_exec($ch);
$fields = array(
'ctl00$WelcomePageMainContent$ctl00$Username' => '********',
'ctl00$WelcomePageMainContent$ctl00$Password' => '********',
);
$fields_string = '';
foreach($fields as $key=>$value) {
$fields_string .= $key . '=' . $value . '&';
}
rtrim($fields_string, '&');
// Post login form and follow redirects
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, count($fields));
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
#$output = curl_exec($ch);
$url = "https://www.investagrams.com/Stock/RealTimeMonitoring";
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
#echo $output;
#print_r($output);
$dom = new DomDocument;
#$dom->loadHtml($output);
$xpath = new DomXPath($dom);
// collect header names
$headerNames = array();
foreach ($xpath->query('//table[#id="StockQuoteTable"]//th') as $node) {
$headerNames[] = $node->nodeValue;
}
// collect data
$data = array();
foreach ($xpath->query('//tbody[#id="StockQuoteTable:tbody_element"]/tr') as $node) {
$rowData = array();
foreach ($xpath->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
$data[] = array_combine($headerNames, $rowData);
}
print_r($data);
?>
Additionally I added an # symbol before the loadHTML function to suppress errors.
Related
I am scraping this website using curl. It has a form(left one) that need to submit using curl.
What I have done: If you see my code, first curl_exec is working fine and I am getting captcha code and other attributes perfectly. In 2nd curl_exec, when I am posting all attributes in form, then I got error.
My code is here:
$url = 'http://epunjabschool.gov.in/gs_schoolwebsite/Search.aspx';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 50);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
$httpCode = curl_getinfo($ch , CURLINFO_HTTP_CODE); // this results 0 every time
$response = curl_exec($ch);
if ($response === false){
$response = curl_error($ch);
echo stripslashes($response);
}
//curl_close($ch);
$dom = new DOMDocument;
#$dom->loadHTML($response);
$tags = $dom->getElementsByTagName('input');
$VIEWSTATE = '';
$EVENTVALIDATION = '';
for($i=0;$i<$tags->length; $i++){
$grab = $tags->item($i);
//echo $grab->getAttribute('value');
if($grab->getAttribute('name') === '__VIEWSTATE'){
$VIEWSTATE = $grab->getAttribute('value');
}
if($grab->getAttribute('name') === '__EVENTVALIDATION'){
$EVENTVALIDATION = $grab->getAttribute('value');
}
}
$domx = new DOMXPath($dom);
$trans_id = $domx->query('//img[#id="imgCaptcha"]');
$imgCaptcha = '';
foreach ($trans_id as $id) {
$imgCaptcha = $id->getAttribute('src');
}
$captch = explode('=', $imgCaptcha);
echo $captcha = $captch[1];
$data = array(
"__VIEWSTATE" => $VIEWSTATE,
"__EVENTVALIDATION" => $EVENTVALIDATION,
"__EVENTARGUMENT" => '',
"__EVENTTARGET" => '',
"__LASTFOCUS" => '',
"txtCaptcha" => $captcha,
"ddlDistrict" => 2,
"ddlEdBlock" => 2,
"ddlManagement" => 1,
"ddlCategory" => 4,
"ddlDistrictBySchoolName" => "Select",
"btnShow" => "Show",
"txtCaptchbySchoolName" => "",
"txtSchoolName" => ""
);
$url = 'http://epunjabschool.gov.in/gs_schoolwebsite/Search.aspx';
$headers = array("Content-Type: application/x-www-form-urlencoded");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($data));
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
if ($response === false){
$response = curl_error($ch);
}
echo stripslashes($response);
Issue: This is error I am getting:
Invalid postback or callback argument. Event validation is enabled using in configuration or <%# Page EnableEventValidation="true" %> in a page. For security purposes, this feature verifies that arguments to postback or callback events originate from the server control that originally rendered them.
I need to login into R-studio in order to grab (curl) an image.
I use the procedure as below, but it does not return the image, just the form itself.
This is the location of the form:
http://skiweather.eu:8787/auth-sign-in
<?php
$username = 'admin';
$password = 'XXX';
$loginUrl = 'http://skiweather.eu:8787/auth-do-sign-in';
$sign=1;
$cookie= "cookies.txt";
//init curl
$ch = curl_init();
//Handle cookies for the login
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true)
//Set the URL to work with
curl_setopt($ch, CURLOPT_URL, $loginUrl);
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
$response = curl_exec($ch);
if (curl_errno($ch)) die(curl_error($ch));
$dom = new DomDocument();
$dom->loadHTML($response);
$tokens = $dom->getElementsByTagName("meta");
for ($i = 0; $i < $tokens->length; $i++)
{
$meta = $tokens->item($i);
if($meta->getAttribute('name') == 'csrf-token')
$token = $meta->getAttribute('content');
}
$postinfo = 'username='.$username.'&password='.$password.'&staySignedIn=1&appUri='.$loginUrl.'&_csrf='.$token;
echo $token; //debug info
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
//set the URL to the protected file
curl_setopt($ch, CURLOPT_URL, 'http://skiweather.eu:8787/files/R/devel/PLOTS/snowalert_today_0.png');
//execute the request
$content = curl_exec($ch);
curl_close($ch);
//save the data to disk
file_put_contents('pppp.png', $content);
?>
The form can be fount here: https://services.smartree.com/mcdonalds/applicationform/WebForms/ApplyToJob.aspx
After I get __VIEWSTATE & __EVENTVALIDATION I start the submit process.
// Start Submit process
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $f);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
// Populate all POST fields
$postfields = array();
$postfields['__EVENTVALIDATION'] = urlencode($eventValidation);
$postfields['__VIEWSTATEENCRYPTED'] = urlencode('');
$postfields['__VIEWSTATEGENERATOR'] = urlencode($eventGenerator);
$postfields['ucApplyToJob$cmbSelectedJob']=urlencode($values['job_type']);
$postfields['ucApplyToJob$txtLastName'] = $values['lname'];
$postfields['ucApplyToJob$txtFirstName'] = $values['name'];
$p = "";
foreach ($postfields as $k => $v) {
$p .= $k . '=' . $v . '&';
}
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch); // Get result after login page.
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $ret;
print_r($header);
I have populated all required fields but nothing. Please help.
I am coding an aubook appointment script using PHP. There is a calendar with available dates to book.
I successfully do logging, I successfully get random dates, I successfully get available dates parameters, then finally I fail to post data and book the appointment.
After I successfully book with this simple script, I have to make a condition - if there is an avaliable dates try to book else continue to refresh
<?php
set_time_limit(0);// to infinity for
$ch = curl_init();
$headers[] = "Accept: */*";
$headers[] = "Connection: Keep-Alive";
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, 'https://example.com/login/login.php');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$co = curl_exec($ch);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($co);
# Parse the HTML
# The # before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
$xpath = new DOMXPath($doc);
$val1 = $xpath->query('//input[#name="_sid"]/#value')->item(0)->nodeValue;
echo $val1;
echo '<br/>';
$field['process'] = 'login';
$field['_sid'] = $val1;
$field['email'] = 'myemail#example.com';
$field['pwd'] = '123456';
$datafield = http_build_query($field);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $datafield);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'https://example.com/login/myapp.php?fg_id=5568094');
$cur = curl_exec($ch);
$do = new DOMDocument(); // New dom Doc to Get URL from calender of avaliable dates
libxml_use_internal_errors(true);
$do->loadHTML($cur);
# Parse the HTML
# The # before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
$xpath = new DOMXPath($do);
$onClickAttrNodeList = $xpath->query('//a[#class="dispo"]/#onclick'); //array contains URL
$array = array(); // CONVERT NODE LIST OBJECT TO ARRAY
foreach($onClickAttrNodeList as $node){
$array[] = $node;
}
$x=array();
foreach($array as $node) {
for($i = 0; $i < 10; ++$i) {
$x[] = $node->nodeValue; //PARSE ALL LINK AS TABLE
}
}
$randlink = array_rand($x, 10); //get gandom link from calender of avaliable dates
$link = $x[$randlink[0]];
echo '<br/>';
preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $link, $match); //Get URL from the last array
echo "<pre>";
$url = $match[0];
print_r($url[0]);
echo'<br/>';
parse_str( parse_url( $url[0], PHP_URL_QUERY), $arrayurl ); // GET parametres from the URL of avaliable dates to book
var_dump($arrayurl);
/* in this part of code
i am trying to post
parametres to book
an appottment i failed on this step */
$fieldbook['timestamp'] = $arrayurl[0];
$fieldbook['skey'] = $arrayurl[1];
$fieldbook['process'] = $arrayurl[2];
$fieldbook['what'] = $arrayurl[3];
$fieldbook['fg_id'] = $arrayurl[4];
$fieldbook['result'] = $arrayurl[5];
$fieldbook['issuer_view'] = $arrayurl[6];
$datafieldbook = http_build_query($fieldbook);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, 'https://example.com/login/action.php');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $datafieldbook);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'https://example.com/login/myapp.php?fg_id=5568094');
$book = curl_exec($ch);
echo'<br/>';
echo $book;
curl_close($ch);
?>
Thank you .
Problem solved , i have just Add user agent and some headers .
Thank you guys :-).
i am using curl in my yii application first time.
following is my code
$ch = curl_init();
$u='admin';
$p='admin123';
$postvars = '';
foreach($string as $key=>$value) {
$postvars .= $key . "/" . $value . "/";
}
curl_setopt($ch, CURLOPT_URL, 'http://localhost:83/Working-copy/tradiecom/ServiceTypeMaster/Create' );
curl_setopt($ch, CURLOPT_USERPWD, $u.':'.$p);
curl_setopt($ch, CURLOPT_FORBID_REUSE, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, count($string));
curl_setopt($ch, CURLOPT_POSTFIELDS, $postvars);
$json = curl_exec($ch);
if ($json === false)
{
echo 'test';
}
print_r($json);
Now i don't know how to access post variables in ServiceTypeMaster/create action??
And i don't know whether my action is called or not?
How can i know that.