I am a member of Lynda.com, I want to fetch a HTML page from their site and save it onto my disk, the problem is whenever I try to fetch a page via CURL, I get the non-member page (it asks me to sign up), I cant understand why I cant get the members page :(
My code:
get_remote_file_to_cache();
function get_remote_file_to_cache()
{
$the_site = "http://www.lynda.com/AIR-3-0-tutorials/Flex-4-6-and-Mobile-Apps-New-Features/90366-2.html";
$curl = curl_init();
$fp = fopen("cache/temp_file.html", "w");
curl_setopt($curl, CURLOPT_URL, $the_site);
curl_setopt($curl, CURLOPT_COOKIE, '/cookie.txt');
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$http_headers = array(
'Host: www.lynda.com',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
'Accept: */*',
'Accept-Language: en-us,en;q=0.5',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Connection: keep-alive'
);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $http_headers);
curl_exec($curl);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if($httpCode == 404)
{
touch('cache/404_err.txt');
}
else
{
$contents = curl_exec($curl);
fwrite($fp, $contents);
}
curl_close($curl);
}
I am on Windows 7 and running on this on WAMP.
One of the things I am not sure about is if the "cookie.txt" file is getting read or not (not sure if the path is correct so I put the cookie.txt file in the root of the server as well as in the directory I am running this script from).
Thanks in advance!
----------- Found some code via the online manual ---------
// $url = page to POST data
// $ref_url = tell the server which page you came from (spoofing)
// $login = true will make a clean cookie-file.
// $proxy = proxy data
// $proxystatus = do you use a proxy ? true/false
function
curl_grab_page($url,$ref_url,$data,$login,$proxy,$proxystatus){
if($login == 'true') {
$fp = fopen("ryanCookie.txt", "w");
fclose($fp);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "ryanCookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "ryanCookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'true') {
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $ref_url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start();
return curl_exec ($ch); // execute the curl command
ob_end_clean();
curl_close ($ch);
unset($ch);
}
echo curl_grab_page("https://www.lynda.com/login/login.aspx", "http://www.lynda.com/", "simple_username=*******&simple_password=*******", "true", "null", "false")."done!";
But it still does not work :(
This is the page where I got the above code: http://php.net/manual/en/function.curl-setopt.php
You need to understand how the internet and http work. You see, when you access a website, they usually give you cookies to track your status. You will also start as non logged-in member. After you hit login button, the server will update your status to logged-in and store this status, either in server site session or in your browser using cookies.
Back to your question, since you want to access member page, this mean, you need to do the following step by first, learn how lynda.com work. However, my step below is rather general:
Load login page and get the form information
inject form information with your login info and send the form back to server
store cookies received from server
load member page (don't forget to include cookies information from step 4) and fetch the html
For more information, you can look at this resources:
http://www.codingforums.com/showthread.php?t=252335
http://simpletest.sourceforge.net/en/browser_documentation.html
https://gist.github.com/3697293
Maybe you need to send Authorization header, which contain your username and password for the site in the HTTP header part.
To get the member page you need to login on the website. To do that, you need to:
visit login page
make the same request as your browser would do to submit login credentials
fetch the member page
Alternatively, you could try to extract cookies from your browser after login and use them in curl with curl_setopt($ch, CURLOPT_COOKIE, 'a=b;c=d');, but this might not work as the website can also use IP or session check.
Related
i want to fetch a URL from bio with php.
URL: https://www.instagram.com/sukhcha.in/ (It can be anyone's profile)
I tried using simple_html_dom but it always shows https error while fetching html from url.
As advised in my comment, you should use cURL, because it supports HTTPS protocol :
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 0); // Timeout (0 : no timeout)
curl_setopt($ch, CURLOPT_HEADER, false); // Do not download header
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'); // creates user-agent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // do not output content
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirections
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // do not check HTTPS host (very important, if you set it to true, it probably won't work)
curl_setopt($ch, CURLOPT_URL, 'https://www.instagram.com/sukhcha.in/');
$content = curl_exec($ch);
?>
Then you have to use XPath on your $content variable to extract the part you want.
You can use CURLto get data.
$url = 'https://weather.com/weather/tenday/l/USMO0460:1:US';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
$curl_response = curl_exec($curl);
Debug data
echo '<pre>';
print_r($curl_response);
echo '</pre>';
Close curl
curl_close($curl);
using cURL I'm trying to transfer to my server a zip file from another server.
after authentication this second server gives me a url that contains even parameters as credentials:
http://content.website.com/file_to_download.zip?nvb=20160622094506&nva=20160622095506&hash=089366e5fe3da46f9caf2
if I surf this url to another web browser or on a private session I can download the content without problems (doesn't ask for login), but if I send the url to my function I get an empity file.
this is my function
function download ($zipUrl, $zipFilename){
$ch = curl_init();
$fp = fopen ($zipFilename, 'w+');
$ch = curl_init($zipUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2'));
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_POSTFIELDS, "nvb=20160622094506&nva=20160622095506&hash=089366e5fe3da46f9caf2");
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
what's wrong?
If curl_exec returns error, you can read that error with curl_error. Take a look what http code server returned so you can understand clearly what's going on.
I'm in this new medium curl but I search the internet for a solution and can not find it. I'm trying to fill a remote form using curl and send data by post. the problem is that the external website has some security measures. One of those is that I need to complete the form to get the value that was generated and keep the cookie. external code page reads:
document.getElementById('sell_session').value = readCookie('classified_session');
My code is this:
$cookie_file = "/home/reelonhe/public_html//temp/cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://www.olx.com.ar/posting.php?categ_id=857');
curl_setopt($ch , CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2');
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept-Language: es-es,en"));
curl_setopt($ch, CURLOPT_SSLVERSION,3);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$result1 = curl_exec($ch);
$error = curl_error($ch);
$contents = curl_exec($ch);
$httpcode = curl_getinfo($ch,CURLINFO_HTTP_CODE);
curl_close($ch);
echo $error;
I tried the absolute path of the cookie with relative path. etc folder and nothing has permission to read and write. Do not know what else to do.
In your cookie file path
$cookie_file = "/home/reelonhe/public_html//temp/cookie.txt";
There seems an extra '/' before temp folder.
It should be
$cookie_file = "/home/reelonhe/public_html/temp/cookie.txt";
I am not sure it will solve your problem.
I am trying to scrape a list of bills from a website after logging into it via curl but on one of the pages the content is not the same as in my browser (namely, instead of showing a list of bills it shows "Your bill history cannot be displayed"). I can correctly scrape other pages that are only available after login so I'm quite puzzled by why that page refuse to display the bill history when I use curl.
Here is my code:
//Load login page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/login');
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieLocation);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieLocation);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$webpage = curl_exec($ch);
//Submit post to login page to authentify
$postVariables = 'emailAddress='.$username.
'&password='.$password;
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postVariables);
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/login/POST.servlet');
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com/login');
$webpage = curl_exec($ch);
//Go to my account main page now that we are logged in
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/My_Account');
curl_setopt($ch, CURLOPT_REFERER, $target);
$webpage = curl_exec($ch); //shows the same content as in the browser
$accountNumber = return_between($webpage, 'id="accountID1">', '<', EXCL); //this is correctly found
//Go to bills page
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/Bill_History/?accountnumber='.$accountNumber);
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com/My_Account');
$webpage = curl_exec($ch); //Not showing the same content as in the browser
The last curl_exec being the one that doesn't work properly.
I have checked extensively the logic of the page and used Tamper Data to analyse what was going on: there doesn't seem to be any javascript / ajax call that would pull the bill history separately, and no POST request: as far as I can see the bill history should be displayed at page load.
Any ideas as to what I could try to fix it or what could be the problem? The fact that it works on other pages is especially puzzling.
Thanks in advance!
EDIT: it still doesn't work but I have found another page on their site where I can get what I need and where the content is displayed correctly - so no need for a solution anymore.
You might add additional header fields that "real" browsers usually transmit:
$header[] = 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';
Just to name a few.
If you happen to use FFox, get that handy "Live HTTP Headers" plugin and check what headers your browser transmits when loading the relevant page. Then try to do the same.
Please, I am solving this problem.. I need to login into Facebook or Twitter or any other website from my PHP script running on my server. I am normaly doing that with CURL, saving cookies to some predefined file.
But now I need something new.. I need to stay logged in with my browser, even when the script ends the login process.
Is that something simple, I can't see.. or am I going into complicated territory?
Something tells me, I would need to use javascript to set all cookies and sending the login data form?
If anyone has done loging into Facebook or Twitter with javascript, can you share some tips or the complete script with me, please?
Thanks for any tips and explaining me the overall logic.
I would look into Facebook Connect, More info here and here.
For your next job :) After a little surfing I found a script you (and sometimes me) needed.
/*
* Login to facebook
* $login_email : Account to login with
* $login_pass : Account password
*
* Returns true if logged in successfully, false otherwise
* Echoes any login error code
*
* Matt Smith - geekalicio.us
* Apr 23, 2009
*/
function fb_login($login_email, $login_pass){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://login.facebook.com/login.php?login_attempt=1');
curl_setopt($ch, CURLOPT_POSTFIELDS,'charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84&locale=en_US&email='.urlencode($login_email).'&pass='.urlencode($login_pass).'&pass_placeholder=&charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, str_replace('\\','/',dirname(__FILE__)).'/fb_cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, str_replace('\\','/',dirname(__FILE__)).'/fb_cookies.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 GTB5");
curl_exec($ch);
$err = 0;
$err = curl_errno($ch);
curl_close($ch);
if ($err != 0){
echo 'error='.$err."\n";
return(false);
} else {
return(true);
}
}
and then you can load home page with
if (fb_login($login_email,$login_pass)){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://login.facebook.com/login.php?login_attempt=1');
curl_setopt($ch, CURLOPT_POSTFIELDS,'charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84&locale=en_US&email='.urlencode($login_email).'&pass='.urlencode($login_pass).'&pass_placeholder=&charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, str_replace('\\','/',dirname(__FILE__)).'/fb_cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, str_replace('\\','/',dirname(__FILE__)).'/fb_cookies.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 GTB5");
$html = curl_exec($ch);
curl_close($ch);
echo $html;
}
Whole script I'm using located at http://pastie.org/619912 .
And yes, use it for good, not for evil :)
I don't think this is supposed to be possible. When your PHP script logs in, it gets an authentication token/cookie for Facebook. That cookie is private and not supposed to be used on any other machine. There are hackish ways to do it, but none I can recommend.