I've been really having a tough time with this. This (in theory) should be pretty straightforward.
My log-in attempt looks like this:
$curl = curl_init("https://login.example.com/login");
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie1.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie2.txt');
$result = curl_exec ($curl);
curl_close($curl);
echo 'result: '.$result;
output:
Host: login.example.com
Accept: */*
Cookie: session=8892b5345209128c_57031fed.Q_lzpZ1aOLg3nAdgfWfP2BZWfOQ
Content-Length: 136
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 136 out of 136 bytes
< HTTP/1.1 302 FOUND
< Server: nginx
< Date: Tue, 05 Apr 2016 02:16:13 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 279
< Connection: keep-alive
< Location: https://page.example.com/dashboard/
* Added cookie logged_in="true" for domain example.com, path /, expire 1491358573
< Set-Cookie: logged_in=true; Domain=example.com; Expires=Wed, 05-Apr-2017 02:16:13 GMT; Secure; HttpOnly; Path=/
* Replaced cookie session="79859c1a698564c0_57031fed.a6asFkkozmLyRysHXCjotKCzwUg" for domain login.example.com, path /, expire 0
< Set-Cookie: session=79859c1a698564c0_57031fed.a6asFkkozmLyRysHXCjotKCzwUg; Domain=login.example.com; Secure; HttpOnly; Path=/
< X-example-App: login
< Strict-Transport-Security: max-age=0
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: max-age=0
< X-Frame-Options: SAMEORIGIN
There is a 302 redirect here but rather than using followlocationI manually call the next url because the cookies appear to be reset when I examine in Chrome development tools.
$curl = curl_init("https://page.example.com/dashboard/");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie2.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie3.txt');
$result = curl_exec ($curl);
curl_close($curl);
The output of this is:
Host: page.example.com
Accept: */*
Cookie: logged_in=true
< HTTP/1.1 403 Forbidden
< Server: nginx
< Date: Tue, 05 Apr 2016 02:16:13 GMT
< Content-Type: text/html
< Content-Length: 564
< Connection: keep-alive
< Strict-Transport-Security: max-age=0
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
<
Why is this page forbidden? When I look at the raw data from the live site this appears to be the exact behavior the site is sending to the server? Why does it work in the browser but not with curl? My research says this may have something to do with HTTPOnly cookies but I don't quite understand how.
I would very much appreciate if someone can take the time to review the above and provide me with any insight.
I also tried doing the same script via Selenium which also failed with the same error. That tells me there is some kind of cookie management that is just not working.
Thanks!
Related
As per the title, I'm trying to use the php curl library to download a repo from github. Here's my code:
$url = 'https://api.github.com/repos/SeanPeterson/Raspberry-Pi-Case/zipball/master';
$curl = curl_init();
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'SeanPeterson');
$content = curl_exec($curl);
if(curl_errno($curl)){
$this->respond('error:' . curl_error($curl));
}
curl_close($curl);
The result of this code is a corrupt zip file that turns into a cpgz file when attempting to unzip. Here's the header info that exists when opening the zip file with a text editor
HTTP/1.1 302 Found
Server: GitHub.com
Date: Wed, 19 Dec 2018 18:47:52 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Status: 302 Found
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 60
X-RateLimit-Reset: 1545248872
Cache-Control: public, must-revalidate, max-age=0
Expires: Wed, 19 Dec 2018 18:47:52 GMT
Location: https://codeload.github.com/SeanPeterson/Raspberry-Pi-Case/legacy.zip/master
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Content-Security-Policy: default-src 'none'
X-GitHub-Request-Id: D01E:1971:33F0754:7030B7C:5C1A9258
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
ETag: "09b31138d130b657cea3c3b5e12191fa7f48c558"
Content-Type: application/zip
Content-Disposition: attachment; filename=SeanPeterson-Raspberry-Pi-Case-09b3113.zip
X-Geo-Block-List:
Date: Wed, 19 Dec 2018 18:47:53 GMT
X-GitHub-Request-Id: D01F:77CB:DA101:20BE73:5C1A9258
When hitting the link with the browser, it downloads the file perfectly. So form this, I assume that I must be making a mistake with how I'm using curl.
Any insight is much appreciated!
Try this:
$url = 'https://api.github.com/repos/SeanPeterson/Raspberry-Pi-Case/zipball/master';
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'SeanPeterson');
$content = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
}
$fp = fopen("test.zip","wb");
fwrite($fp,$content);
fclose($fp);
Think you including the header is what might be corrupting the zip.
I'm trying to scrape a page with the use of cURL but all of my attempts doesn't work.
Here's my code:
PHP
public function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$loc = null;
if(preg_match('#Location: (.*)#', $data, $r)) {
$loc = trim($r[1]);
}
echo "<pre>";
echo var_dump($data);
echo "</pre>";
echo "<pre>";
echo var_dump($loc);
echo "</pre>";
die();
return $data;
}
The response I get by running that is the following:
HTTP/1.1 503 Service Temporarily Unavailable
Date: Wed, 28 Dec 2016 20:29:28 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d6f3effa0b8c33cd8092e9f003d5c751c1482956968; expires=Thu, 28-Dec-17 20:29:28 GMT; path=/; domain=.thedomaintoscrape.com; HttpOnly
X-Frame-Options: SAMEORIGIN
Refresh: 8;URL=/cdn-cgi/l/chk_jschl?pass=1482956972.162-3LFzqX3Gdh
Cache-Control: no-cache
Server: cloudflare-nginx
CF-RAY: 3187c3bb054a551c-ORD
I don't really know what to make of it as I don't understand what the problem is. Can anyone help me?
Hey there I have been looking around for a solution for the problem for a while now, but no luck so far...basically, I want to pull down a page content using curl in PHP. And the following is the code
static function getContent($url) {
// pull down the content that the url pointing to
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_VERBOSE, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_USERAGENT, Constants::$USER_AGENT_CHROME);
$cookie = realpath(Constants::$ROOT_DIR . Constants::$COOKIE);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
// curl_setopt ($curl, CURLOPT_CAINFO, dirname(__FILE__).'/cacert.pem');
$content = curl_exec($curl);
curl_close($curl);
return $content;
}
And the call to the function with the following url always returns me empty content and had no problem so far with other different urls (from different domains) that I tried.
$url = 'https://www.etsy.com/listing/150723421/iretrofone-20-steampunk-silver';
Any reason why?
[EDIT] I ran this script on Amazon Linux, something might be missing on the machine such that the issue got exposed. The two answers so far didn't work with me.
[EDIT] The following is the curl_getinfo output
{"url":"https:\/\/www.etsy.com\/listing\/150723421\/iretrofone-20-steampunk-silver","content_type":"text\/html; charset=UTF-8","http_code":200,"header_size":737,"request_size":287,"filetime":-1,"ssl_verify_result":0,"redirect_count":0,"total_time":0.404801,"namelookup_time":0.028505,"connect_time":0.065447,"pretransfer_time":0.243564,"size_upload":0,"size_download":0,"speed_download":0,"speed_upload":0,"download_content_length":0,"upload_content_length":-1,"starttransfer_time":0.40422,"redirect_time":0,"redirect_url":"","primary_ip":"199.27.79.249","certinfo":[],"primary_port":443,"local_ip":"172.31.29.192","local_port":44605}
[EDIT] the following is the verbose output
* Trying 23.41.253.83...
* Connected to www.etsy.com (23.41.253.83) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=*.etsy.com,OU=Ops,O=Etsy Inc,L=Secaucus,ST=AL,C=US
* start date: Feb 17 18:11:39 2015 GMT
* expire date: Feb 17 18:11:37 2016 GMT
* common name: *.etsy.com
* issuer: CN=Verizon Akamai SureServer CA G14-SHA2,OU=Cybertrust,O=Verizon Enterprise Solutions,L=Amsterdam,C=NL
> GET /listing/150723421/iretrofone-20-steampunk-silver HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
Host: www.etsy.com
Accept: */*
Cookie: uaid=uaid%3DYtum0fFFHW4vd8Fy0IIrtOqKsfXg%26_now%3D1446093672%26_slt%3DDsQSnzXs%26_kid%3D1%26_ver%3D1%26_mac%3DsGZ19jZbFEmxLRCZ87q_mSuvLbRtRjH4LjAYFO74NGg.
< HTTP/1.1 200 OK
< Server: Apache
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Content-Length: 0
< X-Cnection: close
< Content-Type: text/html; charset=UTF-8
< Date: Fri, 30 Oct 2015 16:20:53 GMT
< Connection: keep-alive
* Replaced cookie uaid="uaid%3DYtum0fFFHW4vd8Fy0IIrtOqKsfXg%26_now%3D1446222053%26_slt%3D2FNk-6Hh%26_kid%3D1%26_ver%3D1%26_mac%3DsgAm5o2-yY7aTA7Zt0H4gbSfoCf57mdL9KRraF65fig." for domain etsy.com, path /, expire 1480408753
< Set-Cookie: uaid=uaid%3DYtum0fFFHW4vd8Fy0IIrtOqKsfXg%26_now%3D1446222053%26_slt%3D2FNk-6Hh%26_kid%3D1%26_ver%3D1%26_mac%3DsgAm5o2-yY7aTA7Zt0H4gbSfoCf57mdL9KRraF65fig.; expires=Tue, 29-Nov-2016 08:39:13 GMT; Max-Age=34186700; path=/; domain=.etsy.com; httponly
<
* Connection #0 to host www.etsy.com left intact
Try this code, its working with no cookie.
<?php
function getContent($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$url = 'https://www.etsy.com/listing/150723421/iretrofone-20-steampunk-silver';
echo $a = getContent($url);
?>
I get an error every time when I'm trying to merge a pull request with the bitbucket api.
This is my code:
define('USERNAME','***');
define('PASSWORD','***');
$url = "https://bitbucket.org/api/2.0/repositories/{owner}/{repo}/pullrequests/46/merge";
$curl1 = curl_init();
curl_setopt($curl1, CURLOPT_HTTPAUTH, CURLAUTH_BASIC );
curl_setopt($curl1, CURLOPT_USERPWD, USERNAME . ":" . PASSWORD);
curl_setopt($curl1, CURLOPT_HEADER, true);
curl_setopt($curl1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl1, CURLOPT_URL, $url);
curl_setopt($curl1, CURLOPT_POST, true);
echo curl_exec($curl1);
And this is the error code:
HTTP/1.1 100 Continue HTTP/1.1 400 BAD REQUEST Server: nginx/1.6.2 Date: Wed, 20 May 2015 15:43:27 GMT Content-Type: application/json; charset=utf-8 Content-Length: 113 Connection: keep-alive X-Render-Time: 0.218739032745 Content-Language: de ETag: "2f0273bc2b819d7505bc14bf84d7e129" X-Request-Count: 227 X-Served-By: app19 Vary: Authorization, Accept-Language, Cookie X-Frame-Options: SAMEORIGIN X-Static-Version: 3b0c7aec39d3 X-Version: c288eef4a422 {"error": {"message": "'ascii' codec can't encode character u'\\xe4' in position 11: ordinal not in range(128)"}}
I've already tried to send the informations (owner, repo and request id) with "CURLOPT_POSTFIELDS". But I got the same error.
Can somebody help me?
I have following code:
$curl=curl_init();
$allowAppPostPArams = 'utf8=%E2%9C%93&authenticity_token='.$this->authToken.'&permissions=manage_likes%2Cmanage_relationships%2Cmanage_comments%2C&response_type=code&app_id=d2a4fc6d2751&redirect_uri=http%3A%2F%2FsocialCamFollower%2Findex.php&choice=authorize&commit=Authorize';
curl_setopt($curl, CURLOPT_POSTFIELDS, $allowAppPostPArams);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($curl, CURLOPT_URL, 'https://socialcam.com/oauth/form_submit');
curl_setopt($curl, CURLOPT_REFERER, $urlAppPage);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20100101 Firefox/17.0');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_close($curl);
Browser's requests:
CURL LOG
PROBLEM: Why CURL shows 200 instead of 302?
Server: MAMP, Apache2, php5.3.14. safe mode & open_basedir is not set.
Use this
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'read_header');
...
curl_exec($ch);
...
function read_header($curl, $header) {
echo $header;
return strlen($header);
}
With CURLOPT_FOLLOWLOCATION enabled you should get an output like
HTTP/1.0 301 Moved Permanently
Date: Fri, 20 Apr 2012 11:26:37 GMT
Server: Apache
Location: http://www.spiegel.de/
Content-Length: 230
Content-Type: text/html; charset=iso-8859-1
X-Cache: MISS from lnxp-3968.srv.mediaways.net
X-Cache-Lookup: MISS from lnxp-3968.srv.mediaways.net:91
Via: 1.0 lnxp-3968.srv.mediaways.net (squid/3.1.4)
Connection: close
HTTP/1.0 200 OK
Date: Fri, 20 Apr 2012 11:25:38 GMT
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.0.3SP1 (build: CVSTag=JBoss_4_0_3_SP1 date=200510231054)/Tomcat-5.5
Cache-Control: max-age=120
Expires: Fri, 20 Apr 2012 11:27:38 GMT
X-Host: lnxp-2885
X-Robots-Tag: index, follow, noarchive
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 161305
Vary: Accept-Encoding
Age: 59
X-Cache: HIT from lnxp-3954.srv.mediaways.net
X-Cache-Lookup: HIT from lnxp-3954.srv.mediaways.net:90
Via: 1.1 www.spiegel.de, 1.0 lnxp-3954.srv.mediaways.net (squid/3.1.4)
Connection: close