PHP cURL not being allowed? - php

I'm trying to scrape a page with the use of cURL but all of my attempts doesn't work.
Here's my code:
PHP
public function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$loc = null;
if(preg_match('#Location: (.*)#', $data, $r)) {
$loc = trim($r[1]);
}
echo "<pre>";
echo var_dump($data);
echo "</pre>";
echo "<pre>";
echo var_dump($loc);
echo "</pre>";
die();
return $data;
}
The response I get by running that is the following:
HTTP/1.1 503 Service Temporarily Unavailable
Date: Wed, 28 Dec 2016 20:29:28 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d6f3effa0b8c33cd8092e9f003d5c751c1482956968; expires=Thu, 28-Dec-17 20:29:28 GMT; path=/; domain=.thedomaintoscrape.com; HttpOnly
X-Frame-Options: SAMEORIGIN
Refresh: 8;URL=/cdn-cgi/l/chk_jschl?pass=1482956972.162-3LFzqX3Gdh
Cache-Control: no-cache
Server: cloudflare-nginx
CF-RAY: 3187c3bb054a551c-ORD
I don't really know what to make of it as I don't understand what the problem is. Can anyone help me?

Related

Trouble using PHP curl to download github repo

As per the title, I'm trying to use the php curl library to download a repo from github. Here's my code:
$url = 'https://api.github.com/repos/SeanPeterson/Raspberry-Pi-Case/zipball/master';
$curl = curl_init();
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'SeanPeterson');
$content = curl_exec($curl);
if(curl_errno($curl)){
$this->respond('error:' . curl_error($curl));
}
curl_close($curl);
The result of this code is a corrupt zip file that turns into a cpgz file when attempting to unzip. Here's the header info that exists when opening the zip file with a text editor
HTTP/1.1 302 Found
Server: GitHub.com
Date: Wed, 19 Dec 2018 18:47:52 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Status: 302 Found
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 60
X-RateLimit-Reset: 1545248872
Cache-Control: public, must-revalidate, max-age=0
Expires: Wed, 19 Dec 2018 18:47:52 GMT
Location: https://codeload.github.com/SeanPeterson/Raspberry-Pi-Case/legacy.zip/master
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Content-Security-Policy: default-src 'none'
X-GitHub-Request-Id: D01E:1971:33F0754:7030B7C:5C1A9258
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
ETag: "09b31138d130b657cea3c3b5e12191fa7f48c558"
Content-Type: application/zip
Content-Disposition: attachment; filename=SeanPeterson-Raspberry-Pi-Case-09b3113.zip
X-Geo-Block-List:
Date: Wed, 19 Dec 2018 18:47:53 GMT
X-GitHub-Request-Id: D01F:77CB:DA101:20BE73:5C1A9258
When hitting the link with the browser, it downloads the file perfectly. So form this, I assume that I must be making a mistake with how I'm using curl.
Any insight is much appreciated!
Try this:
$url = 'https://api.github.com/repos/SeanPeterson/Raspberry-Pi-Case/zipball/master';
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'SeanPeterson');
$content = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
}
$fp = fopen("test.zip","wb");
fwrite($fp,$content);
fclose($fp);
Think you including the header is what might be corrupting the zip.

PHP XML error 415 when send POST CURL

iam using PHP CURL to send POSTFIELD but i Getting Error 415 Unsupported Media Type
Here it's my code :
$data = '<?xml version="1.0" encoding="euc-kr" ?>
<Product>
<selMnbdNckNm>Catenzo YI 099</selMnbdNckNm>
<selMthdCd>01</selMthdCd>
<dispCtgrNo></dispCtgrNo>
<prdAttrCd></prdAttrCd>
<dispCtgrNo></dispCtgrNo>
<prdAttrVal><prdAttrVal>
<prdNm></prdNm>
<prdStatCd></prdStatCd>
<prdWght></prdWght>
<dlvGrntYn></dlvGrntYn>
<minorSelCnYn></minorSelCnYn>
<suplDtyfrPrdClfCd></suplDtyfrPrdClfCd>
<prdImage01></prdImage01>
<prdImage02></prdImage02>
<prdImage03></prdImage03>
<prdImage04></prdImage04>
<prdImage05></prdImage05>
<htmlDetail></htmlDetail>
<selTermUseYn></selTermUseYn>
<selPrc></selPrc>
<prdSelQty></prdSelQty>
<asDetail></asDetail>
<rtngExchDetail></rtngExchDetail>
</Product>';
$data = simplexml_load_string($data);
$ch = curl_init();
$header = array(
"Content-Type: application/xml",
"Accept-Charset: utf-8",
"openapikey:myapikey",
'Content-Length: ' . strlen($data)
);
curl_setopt($ch, CURLOPT_URL, "http://dev.api.elevenia.co.id/rest/prodservices/product");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SAFE_UPLOAD, false);
//curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$return=curl_exec($ch);
echo "[MSG] Result -Xml : \n";
echo $return;
I getting error message
HTTP/1.1 415 Unsupported Media Type Date: Wed, 03 Jan 2018 18:19:00 GMT Server: Apache Cache-Control: no-cache Cache-Control: no-store Pragma: no-cache Content-Length: 903 Expires: Thu, 01 Jan 1970 00:00:00 GMT Set-Cookie: WMONID=RgWtbnOnqnT; expires=Thu, 03-Jan-2019 18:19:00 GMT; path=/ X-Powered-By: Servlet/2.5 JSP/2.1 Vary: User-Agent Content-Type: text/html; charset=UTF-8
I hope someone can help me.
Thank You for #astrangeloop and #frz3993 i just have small error with change $header to $headers, and the xml closing tag should be

Download a file with cURL

I am trying to download a file with cURL. I have received an URL where the file is located, but the url makes an redirect before reaching the file. For some reason I always receive the logout page when I access the URL with cURL, but when I enter the URL directly in my browser the file just downloads as it is supposed to. The file that should be downloaded is a RAR file, but instead of the file I get the incorrect login page.
This the current code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "url");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
curl_close($ch);
print_r($result);
As you can see I am using the following code to allow the redirects:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
But I always receive the incorrect login page from the website if I use the above code. Can anyone see what I am doing wrong here ?
This is the reponse I get from the server:
HTTP/1.1 302 Found
Date: Tue, 26 Jan 2016 15:33:18 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.3.2
Set-Cookie: PHPSESSID=session_id; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /nl/nl/export/download/t/Mg==/c/MTQ=/
Vary: Accept-Encoding
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Try with this code
$url = "www.abc.com/xyz";//your url will come here
$fp = fopen ('test.txt', 'w+');
$ch = curl_init(str_replace(" ","%20",$url));
curl_setopt($ch, CURLOPT_TIMEOUT, 100);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
curl_close($ch);
fclose($fp);

Harvest API php no output

I'm trying to work off the Harvest php API example from the GitHub page.
Here is the code, but there is no output no matter what I try to do. If I use the Chrome Postman app I can retrieve data. I'm not sure what I'm doing wrong. Confidential information was removed. Any help is much appreciated.
<?php
$url = "https://company.harvestapp.com/?daily=";
$temp = getURL($url);
echo $temp;
function getURL($url) {
$credentials = "<email>:<password>";
$headers = array(
"Content-Type: application/xml",
"Accept: application/xml",
"Authorization-Basic: " . base64_encode($credentials)
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_USERAGENT, "test app site");
$data = curl_exec($ch);
curl_close($ch);
if ($data !=false){
return $data;
}
else
return "no data";
}
?>
This is the error I get from the curl_error function:
HTTP/1.1 401 Unauthorized Server: nginx Date: Fri, 09 May 2014 22:59:27 GMT Content-Type: application/xml; charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive Status: 401 Unauthorized Cache-Control: private, no-store, no-cache, max-age=0, must-revalidate X-Frame-Options: SAMEORIGIN X-Served-From: https://.harvestapp.com/ X-UA-Compatible: IE=Edge,chrome=1 Set-Cookie: _harvest_sess=; domain=.harvestapp.com; path=/; secure; HttpOnly X-Request-Id: e9d6eb74-5c73-430e-9854-ebf6f1c527c8 X-Runtime: 0.019223 You must be authenticated with Harvest to complete this request.

Retrieving a page of that has a redirect

Okay updated it all. I use this function:
private function get_follow_url($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('Location:(.*)', $a, $r)){
$url=trim($r[1]);
$this->get_follow_url($url);
}
return $a;
}
I get this when it is echoed:
HTTP/1.1 301 Moved Permanently Server: nginx/0.7.67 Date: Sun, 14 Oct 2012 10:03:21 GMT Content-Type: text/html; charset=UTF-8 Connection: keep-alive X-Powered-By: PHP/5.2.17 X-Pingback: http://thesexguy.com/xmlrpc.php Location: http://example.com/mature Content-Length: 0
So I do a recursion.. and try to fetch the page again after scraping the Location word...
It should take me to http://example.com/mature on the recursion? am I right? But I fail to scrape the location word..why?
You need to either use CURLOPT_FOLLOWLOCATION or set cURL to retrieve full headers (CURLOPT_HEADER) and parse Location headers.

Categories