As per the title, I'm trying to use the php curl library to download a repo from github. Here's my code:
$url = 'https://api.github.com/repos/SeanPeterson/Raspberry-Pi-Case/zipball/master';
$curl = curl_init();
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'SeanPeterson');
$content = curl_exec($curl);
if(curl_errno($curl)){
$this->respond('error:' . curl_error($curl));
}
curl_close($curl);
The result of this code is a corrupt zip file that turns into a cpgz file when attempting to unzip. Here's the header info that exists when opening the zip file with a text editor
HTTP/1.1 302 Found
Server: GitHub.com
Date: Wed, 19 Dec 2018 18:47:52 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Status: 302 Found
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 60
X-RateLimit-Reset: 1545248872
Cache-Control: public, must-revalidate, max-age=0
Expires: Wed, 19 Dec 2018 18:47:52 GMT
Location: https://codeload.github.com/SeanPeterson/Raspberry-Pi-Case/legacy.zip/master
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Content-Security-Policy: default-src 'none'
X-GitHub-Request-Id: D01E:1971:33F0754:7030B7C:5C1A9258
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
ETag: "09b31138d130b657cea3c3b5e12191fa7f48c558"
Content-Type: application/zip
Content-Disposition: attachment; filename=SeanPeterson-Raspberry-Pi-Case-09b3113.zip
X-Geo-Block-List:
Date: Wed, 19 Dec 2018 18:47:53 GMT
X-GitHub-Request-Id: D01F:77CB:DA101:20BE73:5C1A9258
When hitting the link with the browser, it downloads the file perfectly. So form this, I assume that I must be making a mistake with how I'm using curl.
Any insight is much appreciated!
Try this:
$url = 'https://api.github.com/repos/SeanPeterson/Raspberry-Pi-Case/zipball/master';
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'SeanPeterson');
$content = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
}
$fp = fopen("test.zip","wb");
fwrite($fp,$content);
fclose($fp);
Think you including the header is what might be corrupting the zip.
Related
I'm trying to scrape a page with the use of cURL but all of my attempts doesn't work.
Here's my code:
PHP
public function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$loc = null;
if(preg_match('#Location: (.*)#', $data, $r)) {
$loc = trim($r[1]);
}
echo "<pre>";
echo var_dump($data);
echo "</pre>";
echo "<pre>";
echo var_dump($loc);
echo "</pre>";
die();
return $data;
}
The response I get by running that is the following:
HTTP/1.1 503 Service Temporarily Unavailable
Date: Wed, 28 Dec 2016 20:29:28 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d6f3effa0b8c33cd8092e9f003d5c751c1482956968; expires=Thu, 28-Dec-17 20:29:28 GMT; path=/; domain=.thedomaintoscrape.com; HttpOnly
X-Frame-Options: SAMEORIGIN
Refresh: 8;URL=/cdn-cgi/l/chk_jschl?pass=1482956972.162-3LFzqX3Gdh
Cache-Control: no-cache
Server: cloudflare-nginx
CF-RAY: 3187c3bb054a551c-ORD
I don't really know what to make of it as I don't understand what the problem is. Can anyone help me?
I've been really having a tough time with this. This (in theory) should be pretty straightforward.
My log-in attempt looks like this:
$curl = curl_init("https://login.example.com/login");
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie1.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie2.txt');
$result = curl_exec ($curl);
curl_close($curl);
echo 'result: '.$result;
output:
Host: login.example.com
Accept: */*
Cookie: session=8892b5345209128c_57031fed.Q_lzpZ1aOLg3nAdgfWfP2BZWfOQ
Content-Length: 136
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 136 out of 136 bytes
< HTTP/1.1 302 FOUND
< Server: nginx
< Date: Tue, 05 Apr 2016 02:16:13 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 279
< Connection: keep-alive
< Location: https://page.example.com/dashboard/
* Added cookie logged_in="true" for domain example.com, path /, expire 1491358573
< Set-Cookie: logged_in=true; Domain=example.com; Expires=Wed, 05-Apr-2017 02:16:13 GMT; Secure; HttpOnly; Path=/
* Replaced cookie session="79859c1a698564c0_57031fed.a6asFkkozmLyRysHXCjotKCzwUg" for domain login.example.com, path /, expire 0
< Set-Cookie: session=79859c1a698564c0_57031fed.a6asFkkozmLyRysHXCjotKCzwUg; Domain=login.example.com; Secure; HttpOnly; Path=/
< X-example-App: login
< Strict-Transport-Security: max-age=0
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: max-age=0
< X-Frame-Options: SAMEORIGIN
There is a 302 redirect here but rather than using followlocationI manually call the next url because the cookies appear to be reset when I examine in Chrome development tools.
$curl = curl_init("https://page.example.com/dashboard/");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie2.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie3.txt');
$result = curl_exec ($curl);
curl_close($curl);
The output of this is:
Host: page.example.com
Accept: */*
Cookie: logged_in=true
< HTTP/1.1 403 Forbidden
< Server: nginx
< Date: Tue, 05 Apr 2016 02:16:13 GMT
< Content-Type: text/html
< Content-Length: 564
< Connection: keep-alive
< Strict-Transport-Security: max-age=0
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
<
Why is this page forbidden? When I look at the raw data from the live site this appears to be the exact behavior the site is sending to the server? Why does it work in the browser but not with curl? My research says this may have something to do with HTTPOnly cookies but I don't quite understand how.
I would very much appreciate if someone can take the time to review the above and provide me with any insight.
I also tried doing the same script via Selenium which also failed with the same error. That tells me there is some kind of cookie management that is just not working.
Thanks!
I am trying to download a file with cURL. I have received an URL where the file is located, but the url makes an redirect before reaching the file. For some reason I always receive the logout page when I access the URL with cURL, but when I enter the URL directly in my browser the file just downloads as it is supposed to. The file that should be downloaded is a RAR file, but instead of the file I get the incorrect login page.
This the current code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "url");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
curl_close($ch);
print_r($result);
As you can see I am using the following code to allow the redirects:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
But I always receive the incorrect login page from the website if I use the above code. Can anyone see what I am doing wrong here ?
This is the reponse I get from the server:
HTTP/1.1 302 Found
Date: Tue, 26 Jan 2016 15:33:18 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.3.2
Set-Cookie: PHPSESSID=session_id; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /nl/nl/export/download/t/Mg==/c/MTQ=/
Vary: Accept-Encoding
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Try with this code
$url = "www.abc.com/xyz";//your url will come here
$fp = fopen ('test.txt', 'w+');
$ch = curl_init(str_replace(" ","%20",$url));
curl_setopt($ch, CURLOPT_TIMEOUT, 100);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
curl_close($ch);
fclose($fp);
I'm trying to work off the Harvest php API example from the GitHub page.
Here is the code, but there is no output no matter what I try to do. If I use the Chrome Postman app I can retrieve data. I'm not sure what I'm doing wrong. Confidential information was removed. Any help is much appreciated.
<?php
$url = "https://company.harvestapp.com/?daily=";
$temp = getURL($url);
echo $temp;
function getURL($url) {
$credentials = "<email>:<password>";
$headers = array(
"Content-Type: application/xml",
"Accept: application/xml",
"Authorization-Basic: " . base64_encode($credentials)
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_USERAGENT, "test app site");
$data = curl_exec($ch);
curl_close($ch);
if ($data !=false){
return $data;
}
else
return "no data";
}
?>
This is the error I get from the curl_error function:
HTTP/1.1 401 Unauthorized Server: nginx Date: Fri, 09 May 2014 22:59:27 GMT Content-Type: application/xml; charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive Status: 401 Unauthorized Cache-Control: private, no-store, no-cache, max-age=0, must-revalidate X-Frame-Options: SAMEORIGIN X-Served-From: https://.harvestapp.com/ X-UA-Compatible: IE=Edge,chrome=1 Set-Cookie: _harvest_sess=; domain=.harvestapp.com; path=/; secure; HttpOnly X-Request-Id: e9d6eb74-5c73-430e-9854-ebf6f1c527c8 X-Runtime: 0.019223 You must be authenticated with Harvest to complete this request.
I have following code:
$curl=curl_init();
$allowAppPostPArams = 'utf8=%E2%9C%93&authenticity_token='.$this->authToken.'&permissions=manage_likes%2Cmanage_relationships%2Cmanage_comments%2C&response_type=code&app_id=d2a4fc6d2751&redirect_uri=http%3A%2F%2FsocialCamFollower%2Findex.php&choice=authorize&commit=Authorize';
curl_setopt($curl, CURLOPT_POSTFIELDS, $allowAppPostPArams);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($curl, CURLOPT_URL, 'https://socialcam.com/oauth/form_submit');
curl_setopt($curl, CURLOPT_REFERER, $urlAppPage);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20100101 Firefox/17.0');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_close($curl);
Browser's requests:
CURL LOG
PROBLEM: Why CURL shows 200 instead of 302?
Server: MAMP, Apache2, php5.3.14. safe mode & open_basedir is not set.
Use this
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'read_header');
...
curl_exec($ch);
...
function read_header($curl, $header) {
echo $header;
return strlen($header);
}
With CURLOPT_FOLLOWLOCATION enabled you should get an output like
HTTP/1.0 301 Moved Permanently
Date: Fri, 20 Apr 2012 11:26:37 GMT
Server: Apache
Location: http://www.spiegel.de/
Content-Length: 230
Content-Type: text/html; charset=iso-8859-1
X-Cache: MISS from lnxp-3968.srv.mediaways.net
X-Cache-Lookup: MISS from lnxp-3968.srv.mediaways.net:91
Via: 1.0 lnxp-3968.srv.mediaways.net (squid/3.1.4)
Connection: close
HTTP/1.0 200 OK
Date: Fri, 20 Apr 2012 11:25:38 GMT
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.0.3SP1 (build: CVSTag=JBoss_4_0_3_SP1 date=200510231054)/Tomcat-5.5
Cache-Control: max-age=120
Expires: Fri, 20 Apr 2012 11:27:38 GMT
X-Host: lnxp-2885
X-Robots-Tag: index, follow, noarchive
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 161305
Vary: Accept-Encoding
Age: 59
X-Cache: HIT from lnxp-3954.srv.mediaways.net
X-Cache-Lookup: HIT from lnxp-3954.srv.mediaways.net:90
Via: 1.1 www.spiegel.de, 1.0 lnxp-3954.srv.mediaways.net (squid/3.1.4)
Connection: close