Cant get the html dom by php curl. - php

I have used php CURL to get the html or echo the html. But it is suddenky redirecting, when i am trying with this code.
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
function get_data( $ch, $url, $post, $cookie ){
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
//curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
if( $post != '' )
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
return curl_exec($ch);
}
$url = 'https://iapps.courts.state.ny.us/webcivil/FCASSearch?param=I';
$html = get_data( $ch, $url, '', '' );
echo $html; exit;
I have played with these
CURLOPT_RETURNTRANSFER,
CURLOPT_FOLLOWLOCATION,
CURLOPT_COOKIEJAR,
CURLOPT_COOKIEFILE
But still i got redirection when trying to get the html. How can i get the HTML of the page or is there any other thing try ?

Here is a fixed working code to grab the code of the page.
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
function get_data( $curl, $url, $post, $cookie ){
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
if( $post != '' )
curl_setopt($curl, CURLOPT_POSTFIELDS, $post);
return curl_exec($curl);
}
$url = 'https://iapps.courts.state.ny.us/webcivil/FCASSearch?param=I';
$html = get_data( $ch, $url, '', '' );
echo htmlspecialchars($html);
But have you seen what you get on this? Almost only JS
which doesnt seem to be very usefull to parse.

You can take idea from this code. Give a path to page from which you want to get html content in live_url.
$live_url = "http://www.example.com/page/header.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $live_url);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
$res = curl_getinfo($ch);
curl_close($ch);
echo $content;

Related

php - how to get content with login information

I want to use PHP to get this html content, the final aim I want to get m3u8 link
It requires log-in
http://tv24.vn/livetv/vtv1.html
This is information to test:
user: h2132704#trbvm.com
pass: 12345678
I tried to use curl POST but unsuccessful!
<?php
$username = 'h2132704#trbvm.com';
$password = '12345678';
$loginUrl = 'http://tv24.vn/livetv/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'user='.$username.'&pass='.$password);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$store = curl_exec($ch);
$return=file_get_contents("http://tv24.vn/livetv/vtv1.html"); preg_match('/file: "(.*?)"/', $return, $m3u8);
echo $m3u8;
?>
Any solution?
Thanks much !
Something like this should do it
$username = 'h2132704%40trbvm.com';
$password = '12345678';
$url = 'http://tv24.vn/account/login';
//login form action url
$postinfo = "email=".$username."&password=".$password;
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
curl_exec($ch);
//page with the content I want to grab
curl_setopt($ch, CURLOPT_URL, "http://tv24.vn/livetv/vtv1.html");
//do stuff with the info with DomDocument() etc
$html = curl_exec($ch);
preg_match('/file: "(.*?)"/', $html, $m3u8);
print_r($m3u8[1]);
curl_close($ch);

PHP cURL URL Not Found

I am new to using cURL....whenever I run this I get a page that says "The requested URL /files/ was not found on this server." But, if I go to the link on a browser then it shows up. Please assist. I have looked at similar questions but could not find a solution to my problem.
<?php
$username="user";
$password="pass";
$url="http://website.com/login/";
$cookie="cookie.txt";
$postdata = "login=$username&pass=$password";
//set the directory for the cookie using defined document root var
$dir = DOC_ROOT;
//the info per user with custom func.
$path = $dir;
//login form action url
$cookie_file_path = $path."/cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_exec($ch);
//page with the content I want to grab
curl_setopt($ch, CURLOPT_URL, "http://website.com/files/");
//do stuff with the info with DomDocument() etc
$html = curl_exec($ch);
echo $html;
curl_close($ch);
?>
$postdata = array(
'login' => $username,
'pass' => $password
);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
Change
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
to
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
Make sure you've write permission on the current dir so curl can create the cookie file cookie.txt
you should end-up with something like :
<?php
$username="user";
$password="pass";
$url_login="http://website.com/login/";
$url_files="http://website.com/files/";
$cookie="cookie.txt";
$postdata = "login=$username&pass=$password";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url_login);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, $url_files);
$html = curl_exec($ch);
echo $html;
curl_close($ch);
?>
NOTE:
Sometimes, less is more.
The URL you provided http://website.com/files/ has a redirection, so CURL has to have option enabled to follow redirects.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
While in your code you have
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
0 means false, so you are disabling following on HTTP 30x

Login to Remote Site and Exporting XML with PHP cURL

I'm trying to log into IPS Light and exporting a XML file. The url of the export page is unique and cannot be directly referenced in the script; Meaning that I will be required to click the necessary links on the homepage that will redirect me to the XML Export page.
require_once( dirname(__FILE__) . "/configuration.php" );
$username = "username";
$password = "password";
$dir = site_path."/ctemp";
$path = site_path."/ctemp/test";
$url="https://ipslight.upu.org/";
$postinfo = "email=".$username."&password=".$password;
$cookie_file_path = $path."/cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, "https://ipslight.upu.org/IPSLight.dll");
$html = curl_exec($ch);
curl_close($ch);

Remote PHP Login with CURL in Udemy

I've been facing problem in logging into Udemy website using CURL.
Here's my code.
First there was some problem with the security code, "crsf" which I tried to resolve by obtaining it from "teach" url. and then tried logging in. But failed.
Please help me through this. Thanks.
$url="https://www.udemy.com/teach/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMxpath($dom);
// $x("//input[#name='csrf']/#value")
$csrfQuery = $xpath->query("//input[#name='csrf']/#value");
$csrf = $csrfQuery->item(0)->nodeValue;
$values["email"] = "some_email";
$values["password"] = "some_password";
$username = trim($values["email"]);
$password = trim($values["password"]);
$dir = "/store";
$path = $dir;
//login form action url
$postinfo = "isSubmitted=1&csrf=".$csrf."&email=".urlencode($username)."&password=".urlencode($password)."&displayType=json";
$cookie_file_path = $path."/cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
// curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
curl_exec($ch);
$html = curl_exec($ch);
echo $html;
curl_close($ch);
?>

Get html of https after logging in

I would like to get the html of the page. After some google-ing, i found following code, which isn't working.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, "$username:$passwd");
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo $output;
edit: something like this then:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, '');
curl_setopt($ch, CURLOPT_POSTFIELDS,'username='.urlencode($username).'&passwd='.urlencode($passwd));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/my_cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "");
$output = curl_exec($ch);
echo $output
This just shows the target page, it doesn't log in, nor does it return the html code.
edit 2: now tried with $data = array('username' => $username, 'passwd' => $passwd); and curl_setopt($ch, CURLOPT_POSTFIELDS,$data);
This shows the same as above + an error: Request Entity Too Large
That page does not use HTTP Basic Authentication.
You need to make an HTTP request to match what submitting the form would sent, then you need to process the response (which will probably involve storing cookies).

Categories