I am opening a HTTPS page using cURL. The page I request issues a redirect request. I have set cURL to follow the redirect, but I cannot seem to be able to get it to request the correct page. I have tracked the same request in a browser and I see my browser making a different request to what cURL makes. What can I do to correct this? The correct URL is shown in the output of a verbose cURL dump. It follows the "* Issue another request to this URL"
Here is a snippet of the output from cURL's verbose output:
< HTTP/1.1 302 Moved Temporarily
< Location: /XXX
< Content-Type: text/html; charset=UTF-8
< Date: Tue, 31 Dec 2013 15:51:46 GMT
< Expires: Tue, 31 Dec 2013 15:51:46 GMT
< Cache-Control: private, max-age=0
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Server: GSE
< Alternate-Protocol: 443:quic
< Transfer-Encoding: chunked
<
* Ignoring the response-body
* Connection #0 to host 127.0.0.1 left intact
* Issue another request to this URL: 'XYYYZ'
* Re-using existing connection! (#0) with host 127.0.0.1
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET /??? HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0
The PHP code I use follows:
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE_FILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE_FILE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_PROXY, '127.0.0.1:8888');
$target = ADDR;
curl_setopt($ch, CURLOPT_URL, $target);
$page = curl_exec($ch);
cURL follows the Location: Header, but be sure to send the exact headers (content-language, referer) browser does using CURLOPT_HTTPHEADER option because some servers refuse connectios to prevent automated requests. In Firefox you have live http headers to see what browser does.
Also make sure the Location: header contains the absolute url and not a relative path according to http 1.1.
If that dosen't work you can use the option CURLOPT_HEADER with curl_info to catch the 302 and redirect it manually.
Here i post an example to do it manually so you check if would produce an infinite loop.
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE_FILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE_FILE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_PROXY, '127.0.0.1:8888');
$target = ADDR;
curl_setopt($ch, CURLOPT_URL, $target);
$page = curl_exec($ch);
$curl_info = curl_getinfo($ch);
if ($curl_info['http_code'] == 302 || $curl_info['http_code'] == 301)
{
$response_headers = substr($page, 0, $curl_info['header_size']);
if (preg_match('#Location: (.*)#', $response_headers, $location_header))
{
// Call again curl to follow location; Better to wrap the curl process in a function called follow_location
// echo $location_header return an Array
// echo $location_header[0] return "Location: http//blablabla"
// echo $location_header[1] return URL only "http://blablbalba.com" and you can process with cURL :D
echo $location_header[1];
}
}
Related
I'm trying to create a REST API and looking for a way to login using PHP, the documentation provided a login example using Python but I don't have an idea how to do this using PHP. I'm thinking if there's a PHP version of the Python code below.
See below code:
def login():
global sessionID
req = urllib2.Request("https://<host>/appserver/j_spring_security_check")
req.add_data(urllib.urlencode({"j_username" : "admin","j_password" :"demoserver"}))
res = opener.open(req)
sessionID = getCookie("JSESSIONID",cookies)
# Get the value of JSESSIONID cookie
response = res.read()
return
What is the login script (PHP version) that I can use if I need to login to web service using PHP (considering the Python example)?
Additional information:
Logging into the web service requires a JSON object as the request body with user name and password:
Successful execution of the method will return a Cookie session Id
Example request JSON: {"j_username" : "username", "j_password":"*******"}
User needs to parse the cookies and extract cookie with key as JSESSIONID. This JSESSIONID value needs to be added manually in all headers of the Rest calls
“Cookie”: “JSESSIONID=“ + cookieValue
Another example using Python:
//Request for All Apps
global sessionID
sID = "JSESSIONID="+sessionID
uri = "https://<hostname>/appserver/portal/api/1.0/apps"
req = urllib2.Request(uri)
req.add_header("Content-Type", "application/json")
req.add_header("Cookie", sID) # Header
req.get_method = "lambda: GET” # Method Type
res = opener.open(req) # URL Call
response = res.read()
return response
Request headers:
Host: 192.168.100.100:444
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://192.168.100.100:444/appserver/portal/login;jsessionid=6AD37194D43AB02BB79E26C71554958F
Cookie: JSESSIONID=6AD37194D43AB02BB79E26C71554958F
Connection: keep-alive
Upgrade-Insecure-Requests: 1
----------
When I tried curl using Linux, here's the code.
curl -k -i -H "Content-type: application/x-www-form-urlencoded" -c cookies.txt -X POST https://192.168.100.100:444/appserver/j_spring_security_check -d "j_username=admin&j_password=demoserver"
Here's the result of the linux curl, which I believe has succeed in connecting since I was routed to the welcome page.
HTTP/1.1 302 Found
Date: Thu, 16 Feb 2017 18:41:59 GMT
Server: Apache/2.2.26 (Unix) mod_ssl/2.2.25 OpenSSL/1.0.1e mod_jk/1.2.37
Set-Cookie: JSESSIONID=358446CC1F87B2D698D48AFECA373691; Path=/appserver/; HttpOnly
Location: https://192.168.100.100:444/appserver/portal/welcome;jsessionid=358446CC1F87B2D698D48AFECA373691
Content-Length: 0
Access-Control-Allow-Origin: *
Content-Type: text/plain
----------
But when I tried using PHP curl with the code, still could not connect though.
<?php
$ch = curl_init();
$url = "https://192.168.100.100:444/appserver/j_spring_security_check";
$postData = 'j_username=admin&j_password=demoserver';
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_POST, 1); // -X
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData); // -d
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'application/x-www-form-urlencoded'
)); // -H
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // -c
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // -c
curl_setopt($ch, CURLOPT_HEADER, true); // -i
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // -k
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
echo curl_exec ($ch);
curl_close ($ch);
This is the resulting header in my browser.
Request URL: http://localhost/curl.php
Request method: GET
Remote address: 127.0.0.1:80
Status code: 200 OK
Version: HTTP
Response header:
Host: localhost
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/\*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Response headers:
Date: Thu, 16 Feb 2017 18:43:52 GMT
Server: Apache/2.2.26 (Unix) mod_ssl/2.2.25 OpenSSL/1.0.1e mod_jk/1.2.37
Content-Language: en-US
Content-Length: 4815
Access-Control-Allow-Origin: *
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html;charset=UTF-8
You need to use CURL
Request to url : "https://<host>/appserver/j_spring_security_check"
And post data : "j_username=admin&j_password=demoserver"
So your code would look like
<?php
$ch = curl_init();
$url = "https://<host>/appserver/j_spring_security_check";
$postData = "j_username=admin&j_password=demoserver";
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec ($ch);
echo $server_output;
curl_close ($ch);
?>
Maybe you need cookie support? Something like this:
<?php
// this is for cookie handling in the session
session_start();
$tmpFname = tempnam(sys_get_temp_dir(),"COOKIE");
if (isset($_SESSION['cookies'])) {
file_put_contents($tmpFname,$_SESSION['cookies']);
}
// the request
$ch = curl_init();
$url = "https://<host>/appserver/j_spring_security_check";
$postData = "j_username=admin&j_password=demoserver";
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/x-www-form-urlencoded'
// you may add more request headers here
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// the next two options are for cookie handling
curl_setopt($ch, CURLOPT_COOKIEJAR, $tmpFname);
curl_setopt($ch, CURLOPT_COOKIEFILE, $tmpFname);
$server_output = curl_exec ($ch);
echo $server_output;
curl_close ($ch);
// this is for cookie handling in the session
$_SESSION['cookies'] = file_get_contents($tmpFname);
unlink($tmpFname);
I hope that helps.
Based on your edited question I think it would be best if you did:
<?php
$ch = curl_init();
$url = "https://<host>/appserver/j_spring_security_check";
$postData = '{"j_username":"admin","j_password":"demoserver"}';
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
'Host: 192.168.100.100:444',
'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0',
'Referer: https://192.168.100.100:444/appserver/portal/login;jsessionid=6AD37194D43AB02BB79E26C71554958F',
'Cookie: JSESSIONID=6AD37194D43AB02BB79E26C71554958F'
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // see comment
$server_output = curl_exec ($ch);
echo $server_output;
curl_close ($ch);
I think this suits your question, in it's current form, better.
Hey there I have been looking around for a solution for the problem for a while now, but no luck so far...basically, I want to pull down a page content using curl in PHP. And the following is the code
static function getContent($url) {
// pull down the content that the url pointing to
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_VERBOSE, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_USERAGENT, Constants::$USER_AGENT_CHROME);
$cookie = realpath(Constants::$ROOT_DIR . Constants::$COOKIE);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
// curl_setopt ($curl, CURLOPT_CAINFO, dirname(__FILE__).'/cacert.pem');
$content = curl_exec($curl);
curl_close($curl);
return $content;
}
And the call to the function with the following url always returns me empty content and had no problem so far with other different urls (from different domains) that I tried.
$url = 'https://www.etsy.com/listing/150723421/iretrofone-20-steampunk-silver';
Any reason why?
[EDIT] I ran this script on Amazon Linux, something might be missing on the machine such that the issue got exposed. The two answers so far didn't work with me.
[EDIT] The following is the curl_getinfo output
{"url":"https:\/\/www.etsy.com\/listing\/150723421\/iretrofone-20-steampunk-silver","content_type":"text\/html; charset=UTF-8","http_code":200,"header_size":737,"request_size":287,"filetime":-1,"ssl_verify_result":0,"redirect_count":0,"total_time":0.404801,"namelookup_time":0.028505,"connect_time":0.065447,"pretransfer_time":0.243564,"size_upload":0,"size_download":0,"speed_download":0,"speed_upload":0,"download_content_length":0,"upload_content_length":-1,"starttransfer_time":0.40422,"redirect_time":0,"redirect_url":"","primary_ip":"199.27.79.249","certinfo":[],"primary_port":443,"local_ip":"172.31.29.192","local_port":44605}
[EDIT] the following is the verbose output
* Trying 23.41.253.83...
* Connected to www.etsy.com (23.41.253.83) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=*.etsy.com,OU=Ops,O=Etsy Inc,L=Secaucus,ST=AL,C=US
* start date: Feb 17 18:11:39 2015 GMT
* expire date: Feb 17 18:11:37 2016 GMT
* common name: *.etsy.com
* issuer: CN=Verizon Akamai SureServer CA G14-SHA2,OU=Cybertrust,O=Verizon Enterprise Solutions,L=Amsterdam,C=NL
> GET /listing/150723421/iretrofone-20-steampunk-silver HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
Host: www.etsy.com
Accept: */*
Cookie: uaid=uaid%3DYtum0fFFHW4vd8Fy0IIrtOqKsfXg%26_now%3D1446093672%26_slt%3DDsQSnzXs%26_kid%3D1%26_ver%3D1%26_mac%3DsGZ19jZbFEmxLRCZ87q_mSuvLbRtRjH4LjAYFO74NGg.
< HTTP/1.1 200 OK
< Server: Apache
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Content-Length: 0
< X-Cnection: close
< Content-Type: text/html; charset=UTF-8
< Date: Fri, 30 Oct 2015 16:20:53 GMT
< Connection: keep-alive
* Replaced cookie uaid="uaid%3DYtum0fFFHW4vd8Fy0IIrtOqKsfXg%26_now%3D1446222053%26_slt%3D2FNk-6Hh%26_kid%3D1%26_ver%3D1%26_mac%3DsgAm5o2-yY7aTA7Zt0H4gbSfoCf57mdL9KRraF65fig." for domain etsy.com, path /, expire 1480408753
< Set-Cookie: uaid=uaid%3DYtum0fFFHW4vd8Fy0IIrtOqKsfXg%26_now%3D1446222053%26_slt%3D2FNk-6Hh%26_kid%3D1%26_ver%3D1%26_mac%3DsgAm5o2-yY7aTA7Zt0H4gbSfoCf57mdL9KRraF65fig.; expires=Tue, 29-Nov-2016 08:39:13 GMT; Max-Age=34186700; path=/; domain=.etsy.com; httponly
<
* Connection #0 to host www.etsy.com left intact
Try this code, its working with no cookie.
<?php
function getContent($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$url = 'https://www.etsy.com/listing/150723421/iretrofone-20-steampunk-silver';
echo $a = getContent($url);
?>
I have a php app which posts variable info to a flask app (which does some calculations and returns a result ). I'm running both locally on win7
When I test the url "127.0.0.1:5000/index" using a post with postman, I get a 200 status code (screenshot). However when the php app posts to the flask app I get:
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
I'm using CURL , and the verbose output is:
* About to connect() to 127.0.0.1 port 5000 (#0)
* Trying 127.0.0.1...
* connected
* Connected to 127.0.0.1 (127.0.0.1) port 5000 (#0)
> POST /index/ HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
Host: 127.0.0.1:5000
Accept: */*
Content-Length: 338
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------d5cb02e2edea
< HTTP/1.1 100 Continue
* HTTP 1.0, assume close after body
< HTTP/1.0 404 NOT FOUND
< Content-Type: text/html
< Content-Length: 233
< Server: Werkzeug/0.10.4 Python/2.7.5
My php code looks like:
$data= array('a'=>$a, 'token'=>$token);
$url="http://127.0.0.1:5000/index/";
$output = $this->my_model->get_data($url, $data);
public function get_data($url,$postFieldArray=FALSE) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)");
if ($postFieldArray!= FALSE) {
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFieldArray); //for django
}
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $verbose);
curl_setopt($ch, CURLOPT_URL, $url);
$html = curl_exec($ch);
curl_close($ch);
.......
return $result;
}
Simplified Flask app:
app = Flask(__name__)
app.debug = True
#app.route('/')
def hello_world():
return 'Hello World!'
#app.route('/index',methods=['POST'])
def index():
token = request.form['token']
a = request.form['a']
......
return
if __name__ == '__main__':
app.run()
What am I doing wrong?
You have a trailing slash in the $url variable in the PHP code. That won't work, because you don't have a trailing slash in your Flask code. Look here for more info, under the section "Unique URLs / Redirection Behavior"
I am writing a cURL script to access the current days interest rate from the Fannie Mae website which is https. I havent been able to get past the CURLOPT_SSL_VERIFYPEER, true); option.
No username or password is required, however I need SSL verification turned on.
Testing on XAMPP dev server.
I have downloaded the .crt and .pem certs from the website using FF and saved them in the same source dir and pointed to both using CURLOPT_CAINFO, no luck
I downloaded the latest cacert.pem file from http://curl.haxx.se/ca/cacert.pem and pointed to that as well using CURLOPT_CAINFO, no luck.
If I turn CURLOPT_SSL_VERIFYPEER, to false I can retrieve the header (see below), however when I set it to true there is no header.
Tried about 7-8 solutions found by searching on here along with reading the php documention on cURL and trying several workarounds listed there, no luck.
I need to be able to retrieve the header and eventually the body using CURLOPT_SSL_VERIFYPEER, true
Any help is appreciated.
<?php
// script is designed to access an https site and retrieve the last table showing the most recent 90 day commitment for the Fannie Mae 30 year fixed rate mortgage. Site is designed to work with cookies and has a valid SSL cert.
//turn error reporting on
error_reporting(E_ALL); ini_set("display_errors", 1);
// cookie file name/location
$cookie_file_path = "cookies.txt";
// verify if cookie file is accessible and writable
if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
{
echo 'Cookie file missing or not writable.';
exit;
}
// url connection
$url = "https://www.fanniemae.com/content/datagrid/hist_net_yields/cur30.html";
// Initiate connection
$ch = curl_init();
// Set cURL and other options
curl_setopt($ch, CURLOPT_URL, $url); // set url
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); // set browser/user agent
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // automatically follow Location: headers (ie redirects)
curl_setopt($ch, CURLOPT_AUTOREFERER, 1); // auto set the referer in the event of a redirect
curl_setopt($ch, CURLOPT_MAXREDIRS, 5); // make sure we dont get stuck in a loop
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10); // 10s timeout time for cURL connection
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // allow https verification if true
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); // check common name and verify with host name
curl_setopt($ch, CURLOPT_SSLVERSION,3); // verify ssl version 2 or 3
curl_setopt($ch, CURLOPT_CAINFO, getcwd() . "VeriSignClass3PublicPrimaryCertificationAuthority-G5.pem"); // allow ssl cert direct comparison
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'read_header'); // get header
curl_setopt($ch, CURLOPT_NOBODY, true); // exclude body
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE); // set new cookie session
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); // file to save cookies in
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); // file to read cookies in
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL connection, save cookie file, free up system resources
curl_close($ch);
// show header
function read_header($ch, $string) {
print "Received header: $string";
return strlen($string);
}
?>
This is the header that is received if CURLOPT_SSL_VERIFYPEER is set to false, blank if true
Received header: HTTP/1.1 200 OK
Received header: Date: Thu, 19 Sep 2013 00:40:16 GMT
Received header: Server: Apache
Received header: Set-Cookie: JSESSIONID=4297C1E1760A836F691FE821FBF8B805.cportal-cl01; Path=/; Secure; HttpOnly
Received header: Cache-Control: no-store
Received header: Expires: Wed, 31 Dec 1969 23:59:59 GMT
Received header: Pragma: no-cache
Received header: X-FRAME-OPTIONS: SAMEORIGIN
Received header: Content-Language: en-US
Received header: Content-Length: 9344
Received header: Content-Type: text/html;charset=ISO-8859-1
Received header:
You're excluding the body by using curl_setopt($ch, CURLOPT_NOBODY, true);. And I don't think you need to install certificate on your machine. The following few lines will give you everything.
$url = 'https://www.fanniemae.com/content/datagrid/hist_net_yields/cur30.html';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // set url
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); // set browser/user agent
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'read_header'); // get header
curl_exec($ch);
function read_header($ch, $string) {
print "Received header: $string";
return strlen($string);
}
Using cURL to scrape a secure (i.e. login) page, and I'm at my wits' end. I managed to successfully scrape two sites with little or no problems, and now I just can't log into this one. cURL gets all the pages I ask it to, but they're all not logged in, which doesn't help. So maybe someone could spot a mistake I've missed?
The code is:
$url_to = 'http://fastorder.newrock.es/store2009/index.php/customer/account/loginPost/';
$url_from = 'http://fastorder.newrock.es/store2009/index.php/customer/account/login/';
$url_get = 'http://fastorder.newrock.es/store2009/index.php/';
$name_pass = 'login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=';
function login($link,$user,$from) {
$fp = fopen("cookie.txt", "w");
fclose($fp);
$log = curl_init();
curl_setopt($log, CURLOPT_REFERER, $from);
curl_setopt($log, CURLOPT_URL, $link);
curl_setopt($log, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($log, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($log, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6");
curl_setopt($log, CURLOPT_TIMEOUT, 40);
curl_setopt($log, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($log, CURLOPT_HEADER, TRUE);
curl_setopt($log, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($log, CURLOPT_POST, TRUE);
curl_setopt($log, CURLOPT_POSTFIELDS, $user);
$data = curl_exec($log);
curl_close($log);
}
login($url_to,$name_pass,$url_from);
function get($url) {
$get = curl_init();
curl_setopt($get, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($get, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($get, CURLOPT_URL, $url);
return curl_exec ($get);
curl_close ($get);
}
$html = get($url_get);
echo $html;
This is the (more or less) same script that worked on the other two sites, and it manages to log in fine. What threw me off in the start are the codes in the $name_pass. Turns out the site has named name and password input fields as login[username] and login[password]. Why the hell for, I've no idea, but I've tried sending it both with codes and with brackets, and nothing helped.
Live HTTP Headers is giving me the following for the page:
http://fastorder.newrock.es/store2009/index.php/customer/account/loginPost/
POST /store2009/index.php/customer/account/loginPost/ HTTP/1.1
Host: fastorder.newrock.es
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://fastorder.newrock.es/store2009/index.php/customer/account/login/
Cookie: frontend=6tjul97q4mvn0046ier0k79li8
Content-Type: application/x-www-form-urlencoded
Content-Length: 81
login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=
HTTP/1.1 302 Found
Date: Fri, 26 Feb 2010 12:29:19 GMT
Server: Apache/2.0.63 (CentOS)
X-Powered-By: PHP/5.2.10
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: http://fastorder.newrock.es/store2009/index.php/customer/account/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8
I've tried to copy everything I could to the cURL script, thinking there's some obscure way of blocking the scrip from logging in. But right now I'm totally stuck and I've got no idea what to do next. And I've dug through a lot of tutorials, and they all give advices that worked like a charm for the first two sites.
Halp?
It may be this:
login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=
I'm no curl guru, but your script seems to be OK, so maybe you should not escape the characters.
I would do local tests with curl and this kind of login forms. Maybe you can debug what's wrong from there. If I'm right, there will be empty fields.
Suggestion: Use Fiddler (www.fiddler2.com) to diff the request traffic, CURL vs your browser.
There is something broken with that store's registration/login. The activation email said to just login to activate the account. I've tried logging in multiple times but I get the error "This account is not activated." everytime I try to login.
Below is a quick change that prints the returned login page.
$url_to = 'http://fastorder.newrock.es/store2009/index.php/customer/account/loginPost/';
$url_from = 'http://fastorder.newrock.es/store2009/index.php/customer/account/login/';
$url_get = 'http://fastorder.newrock.es/store2009/index.php/';
$name_pass = 'login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=';
function login($link,$user,$from) {
$fp = fopen("cookie.txt", "w");
fclose($fp);
$log = curl_init();
curl_setopt($log, CURLOPT_REFERER, $from);
curl_setopt($log, CURLOPT_URL, $link);
curl_setopt($log, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($log, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($log, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6");
curl_setopt($log, CURLOPT_TIMEOUT, 40);
curl_setopt($log, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($log, CURLOPT_HEADER, TRUE);
curl_setopt($log, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($log, CURLOPT_POST, TRUE);
curl_setopt($log, CURLOPT_POSTFIELDS, $user);
$data = curl_exec($log);
curl_close($log);
return $data;
}
echo login($url_to,$name_pass,$url_from);
function get($url) {
$get = curl_init();
curl_setopt($get, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($get, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($get, CURLOPT_URL, $url);
return curl_exec ($get);
curl_close ($get);
}
$html = get($url_get);
echo $html;
Edit:
Is the cookies data is being written to the cookies file (cookie.txt)? If not...
Check the file permissions, make sure its writable.
A bug in earlier versions of php5 caused the cookies file option to be ignored.
Details on the bug are here: http://bugs.php.net/bug.php?id=33475
Solution: Add unset($log) after curl_close($log);
Its hard to debug this script w/o being able to test it.