I'm writing a web scraper for a project, which needs to login and save some pages, but after even login, saving the cookie.txt it redirects back to login page. Looks like it's not logging in.
Here is my code:
<?php
$ch = curl_init();
$cookie_file_path = 'cookie.txt';
$cookie_file_path = realpath($cookie_file_path);
$data = array();
$data['txtUser'] = "username";
$data['txtPass'] = "password";
$postStr = "";
foreach($data as $key=>$d){
$postStr .= $key.'='.urlencode($d).'&';
}
$postStr = substr($postStr,0,-1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$agent = $_SERVER["HTTP_USER_AGENT"];
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
//new ones
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_URL,"http://madstore.su/login.php");
curl_setopt($ch,CURLOPT_POST,TRUE);
curl_setopt($ch,CURLOPT_POSTFIELDS,$postStr);
curl_exec ($ch); // execute the curl command
echo 'Curl error: ' . curl_error($ch); //no errrors
curl_close ($ch);
unset($ch);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_URL,"http://madstore.su/index.php");
//new ones
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_exec ($ch);
echo 'Curl error: ' . curl_error($ch);
curl_close ($ch);
?>
I have read about all questions on StackOverflow and searching on Google since hours.
Here is what I'm getting in cookie.txt:
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
#HttpOnly_.madstore.su TRUE / FALSE 1577145000 __cfduid d3f365e8218ab84f921e43db0d1500e7c1391327438626
madstore.su FALSE / FALSE 0 PHPSESSID t41g1j9cdl800e9qdj2pq96ef1
Here is what I'm getting as curl error:
HTTP/1.1 302 Found
Server: cloudflare-nginx
Date: Sun, 02 Feb 2014 08:01:40 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
location: login.php
CF-RAY: f655ad243d007e5-LAX
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Sun, 02 Feb 2014 08:01:41 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6
CF-RAY: f655ad5141f07e5-LAX
I will highly appreciate if anyone can help me with this issue.
Your approach is wrong. First browse the login page(http://madstore.su/login.php) using curl so that it can store the cookies into the file.
Then do the curl posting and use the cookie that saved earlier. Also you are missing this POST parameter, so add this with your data.
$data['btnLogin'] = "Log in";
When the login is done, then browse the required page using your final curl's GET.
Related
I'm running following code :
$username = 'alex';
$loginUrl = 'https://mysite/login';
$ckfile = '/home/alex/www/storage/logs/cookie.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'user=' . $username);
curl_setopt($ch, CURLOPT_COOKIEJAR,$ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$status_code = curl_getinfo($ch);
$resp = curl_exec($ch);
curl_close($ch);
by running return $resp i get:
HTTP/1.1 302 Found Date: Sat, 26 Nov 2016 07:08:59 GMT Expires: Mon, 02 Aug 1999 00:00:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Connection: close Set-Cookie: mnproxy=tEiyCofS5YEOiFe; Path=/; Domain=.mysite Location: http://mysite/connect?session=stEiyCofS5YEOiFe&url=menu Vary: Accept-Encoding Content-Length: 352 Content-Type: text/html; charset=iso-8859-1 Set-Cookie: TS0110ac54=01d9c2a5b1932847d951b9185a227fd3d4bfdf23358732abbed5a18dc8b027fc893bda0c4bf3720f253354b709cb68a505ec9cac6dc051e3e792ddf322a734b650b6c33f2e; Path=/; Domain=.mysite
and the cookie.txt :
.auth.mysite TRUE / FALSE 0 TS0110ac54 01d9c2a5b1932847d951b9185a227fd3d4bfdf23358732abbed5a18dc8b027fc893bda0c4bf3720f253354b709cb68a505ec9cac6dc051e3e792ddf322a734b650b6c33f2e
Here you can see none of headers are saved in cookie else TS0110ac54.
what am i doing wrong?
I'm trying to create a proxy in PHP. When I visit the page http://example.com everything is OK, but using cURL, I always get a 401 Unauthorized Error.
My code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, getallheaders());
curl_setopt($ch, CURLOPT_URL, 'http://example.com');
print_r(curl_exec($ch));
The response:
HTTP/1.1 401 Unauthorized
Content-Type: text/html
Server: Microsoft-IIS/7.5
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
X-Powered-By: ASP.NET
Date: Thu, 20 Oct 2016 01:30:11 GMT
Content-Length: 1293
My problem is related to curl in a php file that follows the Javascript windows.location returns by the server and I cannot succeed to bypass that behavior
In fact I have written a script that connect to a website with a form for user authentication. The script works perfectly in its globality :
1st Request : Get Request to obtain a PHP Session in a Cookie
2nd Request : Post Request with cookie and Post Data containing user/password
Problem : I am always redirected by a javascript function in the server answer by a windows.location=XXXX
For Information I use WampServer Version 2.5 / PHP 5.5.12
My script is called via a web-browser with this : http://localhost/glpiv2/rechercheDerniersSuivisV2.php
First Time I create a cookie
function createCookie (){
global $proxy;
global $proxyauth;
global $cookies_file;
global $timeout;
$url='https://xxx.xxxx.xxx.xxx/glpi/index.php';
$ch = curl_init();
// Proxy Authentication, keep cool with security
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, true);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_POST , false);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch,CURLOPT_HTTPHEADER,array('User-Agent: Mozilla/6.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko','DNT: 1','Connection: Keep-Alive'));
// => WRITE A NEW COOKIE FILE
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies_file);
// => ESTABLISH A NEW SESSION
curl_setopt ($ch, CURLOPT_COOKIESESSION, true);
$file_contents = curl_exec($ch);
// If Error
if(curl_errno($ch)){
// Le message d'erreur correspondant est affiché
echo "ERREUR curl_exec : ".curl_error($ch);
}
curl_close($ch);
}
Server response Header of the first Request
Server: Apache
Set-Cookie: PHPSESSID=3c5939450c6811b8df981f83c9539f64; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, precheck=0
Pragma: no-cache
Content-Length: 253
Connection: close
Content-Type: text/html
Cookie is being created with PHPSESSID : OK
Second time I do a POST request with the newly created cookie to authenticate with a real user / password
function authenticateSession(){
echo "Lancement de authenticateSession";
global $proxy;
global $proxyauth;
global $cookies_file;
global $timeout;
global $authenticationGLPIPost;
$url='https://xxx.xxx.xxx.xxx/glpi/login.php';
$ch1 = curl_init();
// Proxy SSL and other stuff
curl_setopt($ch1, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch1, CURLOPT_PROXY, $proxy);
curl_setopt($ch1, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt($ch1, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch1, CURLOPT_SSL_VERIFYHOST, 0);
// Post preparation
curl_setopt ($ch1, CURLOPT_URL, $url);
curl_setopt($ch1, CURLOPT_HEADER, FALSE);
curl_setopt($ch1, CURLOPT_POST , TRUE);
// POST DATA with variable containing user / password
curl_setopt($ch1, CURLOPT_POSTFIELDS, $authenticationGLPIPost);
curl_setopt($ch1, CURLOPT_COOKIEFILE, $cookies_file);
curl_setopt ($ch1, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch1);
curl_close($ch1);
}
Server Response header of the 2nd Request
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, precheck=0
Pragma: no-cache
Content-Length: 269
Connection: close
Content-Type: text/html
<script language=javascript>
NomNav = navigator.appName;
if (NomNav=='Konqueror'){
window.location="/glpi/front/central.php?tokonq=fsrb7s";
} else {
window.location="/glpi/front/central.php";
}
</script>
=> Problem with javascript windows.location redirection here in the response, that redirect me to http://localhost/glpi/front/central.php
Server response is displayed in the webbrowser.
I suspect that the web browser does execute the Javascript returned and redirect me
I verify that with a proxy interceptor and altering the server response I mean when I supress the Javascript bloc Or if I change the parameter of windows.location the redirect modify its behaviour
I try not being redirected with no success each of these options but no one works
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt ($ch, CURLOPT_MAXREDIRS, 0);
That mean for the first time I run the script and these 2 functions are called I am always redirected on the website page of authentication success that mean with the relative path /glpi/front/central.php.
Oh, I see.
$file_contents = curl_exec($ch1)
Return Values of curl_exec
Returns TRUE on success or FALSE on failure. However, if the CURLOPT_RETURNTRANSFER option is set, it will return the result on success, FALSE on failure.
Unless the CURLOPT_RETURNTRANSFER option is set, $file_contents will not contain the file contents. They will instead be printed (equivalent of echo $file_contents).
How Strange that It could be everything work fine when I set
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
I used below function to get redirected (final) url of these links
http://iprice.my/coupons/zalora/
function curlFileGetContents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20100101 Firefox/16.0');
$result = curl_exec($ch);
$info = curl_getinfo($ch);
$effectiveUrl = $info['url'];
curl_close($ch);
return $effectiveUrl;
}
but I couldn't get anything, I wonder why? Eg I did curlFileGetContents('http://iprice.my/coupons/zalora/#007b9a6024d19edec91d04c2e92e143e744c83b6');
The URL isn't being redirected - the links you are referring to (that only spawn on that page) are pop unders. A look at the curl header shows the URL is not being re-directed:
curl --head http://iprice.my/coupons/zalora/#007b9a6024d19edec91d04c2e92e143e744c83b6
HTTP/1.1 200 OK
Server: nginx/1.4.6 (Ubuntu)
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.5.9-1ubuntu4.9
Cache-Control: no-cache
Date: Fri, 22 May 2015 15:06:31 GMT
X-iPrice-Cached: 1
X-iPrice-Cached-Type: Response
The approach you currently have won't work for what you want to do.
This is an old script that is no longer working that logins to Paypal, I had a working version but deleted it by accident now I'm trying to get an old back up working.
Initialize curl:
$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_REFERER, '');
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.10) Gecko/2009042523 Ubuntu/9.04 (jaunty) Firefox/3.0.10');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, PAYPAL_COOKIE_FILE);
curl_setopt($ch, CURLOPT_COOKIEJAR, PAYPAL_COOKIE_FILE);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
cp_post_page function
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $query_string);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
ob_start();
curl_exec($ch);
$response = ob_get_contents();
ob_end_clean();
return $response;
Rest of the script:
$response = cp_get_page($ch, 'https://www.paypal.com/ca/cgi-bin/webscr?cmd=_ship-now');
$query_string = "CONTEXT=" . $form_context. "&login_cmd=&login_params=&login_email=" . PAYPAL_EMAIL . "&login_password=" . PAYPAL_PASSWORD
. "&submit.x=Log%20In&form_charset=UTF-8&auth=$form_auth"
. "&browser_name=Firefox&browser_version=3&browser_version_full=3&operating_system=Linux";
$response = cp_post_page($ch, $form_action, $query_string); <--- Fails here
Response from the curl:
HTTP/1.1 200 OK Server: Apache Strict-Transport-Security: max-age=14400 Strict-Transport-Security: max-age=14400 Content-Type: text/html DC: slc-a-origin-www-1.paypal.com Date: Mon, 04 Nov 2013 23:45:19 GMT Content-Length: 54 Connection: keep-alive Set-Cookie: X-PP-SILOVER=name%3DLIVE5.WEB.1%26silo_version%3D880%26app%3Dslingshot%26TIME%3D2402383954; domain=.paypal.com; path=/; Secure; HttpOnly Set-Cookie: X-PP-SILOVER=; Expires=Thu, 01 Jan 1970 00:00:01 GMT
Fatal Failure
Headers being sent:
POST /ca/cgi-bin/webscr?cmd=_flow&SESSION=[removed]&dispatch=[removed] HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.10) Gecko/2009042523 Ubuntu/9.04 (jaunty) Firefox/3.0.10
Host: www.paypal.com
Accept: */*
Referer: https://www.paypal.com/ca/cgi-bin/webscr?cmd=_ship-now
Cookie: aksession=1383609018~id=cookieHTdyV99GKjx4ataOFl8HX+fgn4AgJSYxaLCcm+N/2KWZPsBAQbqDZ0ek9tQy9J9/gwTMcvHTSYGX65BNgg10oVSLJurTnMsIlySSW7uFaZjrVKxpCVApCbxyp5lfygq/QA1GvRvOk0k=; DC=slc-a-origin-www-1.paypal.com; X-PP-SILOVER=name%3DLIVE5.WEB.1%26silo_version%3D880%26app%3Dslingshot%26TIME%3D2385606738; Apache=10.73.8.47.1383608717943546; navlns=0.0; pNTcMTtQfrJuaJiwEnWXQ6yNxfq=cRcI945EJH9tChcYgpHw1EwU36Z-qza0kxriR6IYWTNRRqPyItIzb1qCgt5K-W3DjQPwjI8yCfYaNInqtDcheZgtxQX9L7xLZM8pY7bKHS_XWsWt759waXfATBCGKYeusuJuPFdeRH2_qHRlS6s31k4inXdD-TZnRI8OEdaArFLEBx3t4-5d4NV5aeqdVSL8TuDf-kqWJFvs4Xzs2wdBEmpoocMLXGm_igzYEYHmP9KqDIUaXAMPiZUeMmPfAJiBxC8-EN5zJqI7dqs3-BgIPpCi5Is5IQe_84xDMHVBIAAgDgSUByR3-FkmBtPlfDB6rLoItmY0kT9L7yUZFW48kP3yNWHhWQ1o-InAmm; navcmd=_ship-now; cookie_check=yes; KHcl0EuY7AKSMgfvHl7J5E7hPtK=kCH9bvOH2hK9miohP-LJoRVMGNwgry4awBca8g8fKl4vOhFrS5fM82dAIUPhBGuKkYwgeLhgsPV2tVS-; cwrClyrK4LoCV1fydGbAxiNL6iG=yGLUV2f-3wPoFZV-vMEGW6jlZr7im-N5EYbO1KWk2loskGrFWziDsIkn0xLI8kQo5MScMg9TWryYuevj3SLa07p2m_IyjtTAa2W_iF4rbbPYPPsEdbypbjQjxWgd3RZw9IzCAaPJ3ZLS1R1-kJWYRevJBbZaqTApiIlRA8ALAZlOJ1g9ft_FL2GsOERgVaKpjz_aZcVeaKInfcPRHoGc9EMQTz9bFsIardyUhnQxw4Zu19vefkGYYk-1NCtLctqJ1jQ5HVn-3d7clgyddNul7JockOlurWRgPjbfkDjQ7-eXuleFhb9LkfUgpnQXPyvYTUmWh5QYnOEr9q_cRNFCRHs4vIvmhvaziv8Eyg7gFmg8v6--xOQy-a-gBqj7JEgd0kOKOWIRbneb1mm1Icd_o3lkuISss1xKvcIzXrx2scz3fq6Ys8z1VNWDNBK
Content-Length: 376
Content-Type: application/x-www-form-urlencoded
I verified that that the query string contains the right data, it is not urlencoded because curl will encode it from my understanding. I compared the POST data to a form submit using Chrome and the same number of fields are being sent to Paypal.
From what I am seeing the correct post data is being sent, the cookie exists but the login fails because it is missing something (I think it is a cookie issue, it is not a javascript issue as I login with javascript disabled). I'm not sure what other trouble shooting steps I can do to help pin point the issue.
I've solved the issue, given that one input is shown below:
<input type="hidden" id="CONTEXT_CGI_VAR" name="CONTEXT" value="REMOVED">
Even after analyzing a sample response, the reference to CONTEXT should be changed to CONTEXT_CGI_VAR in order for the curl to work correctly. Why? No idea.