LogOn to remote protected site using PHP and cURL - php

I am trying to log on to my company's intranet which is protected by an RSA token. I managed to find out all the necessary data for the log on and it works using this code.
<?php
//init curl
$ch = curl_init();
//Set the URL to work with
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5");
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
//Set the post parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, $var);
//Handle cookies for the login
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
//Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
//not to print out the results of its query.
//Instead, it will return the results as a string return value
//from curl_exec() instead of the usual true/false.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//execute the request (the login)
$store = curl_exec($ch);
?>
After the log on I will be logged out instantly. There is set a maximum of 2 hours for a session. How ist that normally set? Where will I find this information in the original site code? I guess it will be stored in a cookie? What do I have to do in order to not being logged out right after the logIn?
Best regards,
Michael

I believe its due to unavailability of SESSION and Cookie info. Eventhough you are getting login success, with very next request to code will check for cookie value generated with login and will check against session value.
So if they won't matched then you will be logged out.
I don't think this code will work for your purpose.

Related

How to use token for login without refresh curl

i'm trying to login to a website(remotely) lets say example.com/login and that example.com/login use request token to login so i am getting request token from a url like this below
// code for getting token cookies etc
$url = 'http://example.com/login/';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$doc = curl_exec($ch);
curl_close($ch);
// extract __RequestVerificationToken input field
preg_match('#<input name="__RequestVerificationToken" type="hidden" value="(.*?)"#is', $doc, $match);
$token = $match[1];
// code for redirect to dashboard
$postinfo = "Email=".$username."&Password=".$password."&__RequestVerificationToken=".$token;
// var_dump($token); //debug info
$useragent="Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36";
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
$html = curl_exec($ch);
echo $html;
if (curl_errno($ch)) print curl_error($ch);
curl_close($ch);
So, i m getting the token but when trying to login with next curl request obviously the $token keep changing due to refresh so i want to know how i can login to url example.com/login with the same curl script so $token keep same?!
TIA!
first off, a proper dom parser is much more reliable than a regex to extract the token, so use that.
$token = (new DOMXPath(#DOMDocument::loadHTML($dom)))->query("//input[#name='__RequestVerificationToken']")->item(0)->getAttribute("value");
now, the token DEFINITELY changes for each new cookie session. and POSSIBLY changes for each failed login attempt, and POSSIBLY changes for each still-not-logged-in-page-refresh.
now, when you first get the token, you also get assigned a cookie session id. to "log in with the correct token", you must send that same session cookie id with the login request. the easiest way to do this, is to let curl handle cookies automatically, with CURLOPT_COOKIEFILE (ps, you don't need a dedicated file for the cookies, just set an emptystring and curl will take care of the cookies for you) - with that enabled, curl automatically sends the session cookie with the next login request.
and protip: whenever you're debugging curl code, enable CURLOPT_VERBOSE , it gives lots of useful information (like showing you all the cookies it received)

Webpage detecting / displaying different content for curl request - Why?

I need to retrieve and parse the text of public domain books, such as those found on gutenberg.org, with PHP.
To retrieve the content of most webpages I am able to use CURL requests to retrieve the HTML exactly as I would find had I navigated to the URL in a browser.
Unfortunately on some pages, most importantly gutenberg.org pages, the websites display different content or send a redirect header.
For example, when attempting to load this target, gutenberg.org, page a curl request gets redirected to this different but logically related, gutenberg.org, page. I am successfully able to visit the target page with both cookies and javascript turned off on my browser.
Why is the curl request being redirected while a regular browser request to the same site is not?
Here is the code I use to retrieve the webpage:
$urlToScan = "http://www.gutenberg.org/cache/epub/34175/pg34175.txt";
if(!isset($userAgent)){
$userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36";
}
$ch = curl_init();
$timeout = 15;
curl_setopt($ch, CURLOPT_COOKIESESSION, true );
curl_setopt($ch, CURLOPT_USERAGENT,$userAgent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
#curl_setopt($ch, CURLOPT_HEADER, 1); // return HTTP headers with response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $urlToScan);
$html = curl_exec($ch);
curl_close($ch);
if($html == null){
return false;
}
print $html;
The hint is probably in the url: it says "welcome stranger". They are redirecting every "first" time visitor to this page. Once you have visited the page, they will not redirect you anymore.
THey don't seem to be saving a lot of stuff in your browser, but they do set a cookie with a session id. This is the most logical thing really: check if there is a session.
What you need to do is connect with curl AND a cookie. You can use your browsers cookie for this, but in case it expires, you'd be better of doing
request the page.
if the page is redirected, safe the cookie (you now have a session)
request the page again with that cookie.
If all goes well, the second request will not redirect. Until the cookie / session expires, and then you start again. see the manual to see how to work with cookies/cookie-jars
The reason that one could navigate to the target page in a browser without cookies or javascript, yet not by curl, was due to the website tracking the referrer in the header. The page can be loaded without cookies by setting the appropriate referrer header:
curl_setopt($ch, CURLOPT_REFERER, "http://www.gutenberg.org/ebooks/34175?msg=welcome_stranger");
As pointed out by #madshvero, the page also be, surprisingly, loaded by simply excluding the user agent.

login to site with curl-php not working when loggin into 2shared site

I am Trying to login to 2shared with curl-php but for some reason it just returns me login page and does not set proper cookies in cookie file. Below is my code. Thanks for any help.
$user = "";
$pass = "";
$cookie = "cookie.txt";
$jsonp = 'jsonp'.time();
if (file_exists($cookie)) {
unlink($cookie);
}
$post = array(
"login" => $user,
"password" => $pass,
"callback" => $jsonp
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.2shared.com/login?callback=".$jsonp);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('X-Requested-With: XMLHttpRequest'));
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.2shared.com/');
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0");
curl_setopt($ch, CURLOPT_ENCODING, "UTF-8" );
$return = curl_exec($ch);
curl_close($ch);
echo $return;
EDIT:
When I login via browser and watch traffic via HTTP analyzer i noticed after hitting login button it returns this data and redirect to loginRedirect object and i notice it set some cookies which does not appears while I am doing php-curl request:
{
"ok":true,
"rejectReason":"",
"loginRedirect":"http://www.2shared.com/account/homeDoorway.jsp;jsessionid=3F253C7C641C7A8402D4AC9872C1CEAE.dc282?rand=0.8112776952920494",
"loggedIn":"myemail#email.com",
"needActivation":false
}
But when trying to login with curl-php above code it return me this data:
jsonp1339804887({
"ok":true,
"rejectReason":"",
"loginRedirect":"http://www.2shared.com/login.jsp?sessionUnavailable=1",
"loggedIn":"",
"needActivation":false
})
As always when doing web scraping, the key is to compare with a recorded session done manually with a browser (like with LiveHTTPHeaders or similar tools). Then make sure that your script is sending a request as similar as the recorded one as possible.
If you had done that, you would've seen that...
The login form on 2shared doesn't seem to use a multipart formpost, so your passing of $array to CURLOPT_POSTFIELDS is wrong. It should simply be a string in the form of "login=$name&password=$secret". This said, this may not be the only flaw in your approach.
This may be just a short in the dark, but it appears to me that you actually should look at the redirect and follow it. The error message does indicate that you're not actually within a functioning session on the server side – and the session identification is part of the address that you would have been redirected to but chose not to follow. ;jsessionid=3F253C7C641C7A8402D4AC9872C1CEAE.dc282 The latter part ?rand=0.8112776952920494 appears – to me! – to be a random number the system also wants to have sent back. I'll take this to be a trivial token mechanism to make sure that the request actually is fresh and not something like a script that tries to get in :-)
Also, are you certain that the callback mechanism you use (with time) does make so much sense?
Have you tried to get to the login page innocently, watching for the redirect to pop up and then start your other code from there?

Difference between cURL and web browser?

I am trying to retrieve a web page from the following url:
http://www.medicare.gov/find-a-doctor/provider-results.aspx?searchtype=OHP&specgrpids=922&loc=43615&pref=No&gender=Unknown&dist=25&lat=41.65603&lng=-83.66676
It works when I paste it into a browser, but when I run it through cURL, I receive a page with the following error: "One or more query string parameters of requested url are invalid or has unexpected value, please correct and retry."
It doesn't seem to make a difference if I provide a different userAgent or referrer. There is a redirect, so I use CURLOPT_FOLLOWLOCATION.
Here is my code:
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);
echo $html;
Any thoughts on why a request like this will work in the browser and not with cURL?
Your browser is sending cookies that cURL is not. Check the cookies you are sending to the site using browser tools or Fidler - you'll need to pass the same.
The problem was with cookies. This particular site needed an ASP.NET_SessionId cookie set in order to respond. I added the following to my cURL request:
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIE, 'ASP.NET_SessionId=ho1pqwa0nb3ys3441alenm45; path=/; domain=www.medicare.gov');
I don't know if any session id will work, but it tried a couple random ones and they all worked.

php CURL - multiple independent sessions - need help!

Here is my dilemma...
I basically have a script which by means of CURL posts to a 3rd party website to perform a login and then makes another post to update a users details based on that login session. Now as my site is getting busy I have multiple users doing the same thing and it seems that on occasion curl is getting confused and updating one users details with a different users information. This is causing real problems.
It seems to be that the cookie which is being used by a user after one login is being shared by other users and they end up logging in with the same cookie - confusing the 3rd party system. My code is posted below and I need to use the cookiefile and cookiejar to maintain the php session to allow me to do what I need to do. But it seems like the same cookie is being reused by all users....
Does that make sense? Is there anything I can do to change this? Please advise....
Thanks so much!
Below is the code i use to both login and post the user update
function hitForm($postURL, $postFields, $referer="", $showerr = FALSE, $ispost = TRUE) {
global $islocal, $path_escape;
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_URL, $postURL);
if ($ispost)
curl_setopt($ch, CURLOPT_POST, 1);
else
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$ret = curl_exec($ch);
if ($error = curl_error($ch)) {
if ($showerr)
echo 'ERROR: ' . $error;
return -1;
exit;
}
$CU_header = curl_getinfo($ch);
$CU_header["err"] = curl_errno($ch);
$CU_header["errmsg"] = curl_error($ch);
curl_close($ch);
$returnout = $ret;
//for debugging purposes for now we are logging all form posts
SaveLog("hitform", "F[".$this->curruserid." - ".$this->currfunc." - ".date("d-m-y h:i:s")."]".$postFields);
return $ret;
}
You're using the same cookies.txt file for each session, so that's where the shared cookie problem is coming from. You'd need to specify a seperate file for each parallel session you want to run.
You are using a shared cookie jar for all users. Each user needs a separate cookie jar.
You need to use different cookie files for each user.
I assume your postFields includes some unique identifier for each user (like a user id, or a username), so try something like:
$cookie_file = 'cookies_' . $postFields['user_id'] . '.txt';
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
As far as I understand the problem, your script is getting wrong user information. How do you store user info anyway?
I'd say that's the source of the problem - you don't assign a unique identifier to user info, and that's where it gets nasty ;)
So, first of all, I'd associate session id with user information (or let's say, store user information in session, which is unique for everyone), and load it from there. And I guess it should do the trick ;)

Categories