How to fix 'Please enable cookies' Cloudflare error using PHP Curl - php

When using postman locally on my machine, I am able to send the request no problem and get a response back. Because of the invalid token I am sending the api, I should receive this back.
{
"status": "Error",
"message": "Invalid API Token"
}
Using postman's utility to generate php curl code to make this request I get this.
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "https://app.mobilecause.com/api/v2/reports/transactions.json",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "GET",
CURLOPT_POSTFIELDS => "",
CURLOPT_COOKIESESSION => true,
CURLOPT_COOKIEFILE => "cookie.txt",
CURLOPT_COOKIEJAR => "cookie.txt",
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array(
'Authorization: Token token="test_token"',
"Content-Type: application/x-www-form-urlencoded",
"cache-control: no-cache",
),
));
curl_setopt($curl, CURLOPT_VERBOSE, true);
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}
Running this code on my webserver results in a page body returned that is a cloudflare landing page, specifically this.
Please enable cookies.
One more step
Please complete the security check to access app.mobilecause.com
Why do I have to complete a CAPTCHA?
Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.
What can I do to prevent this in the future?
If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.
If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.
Cloudflare Ray ID: RAY_ID • Your IP_REDACTED • Performance & security by Cloudflare
I cannot explain why this happens. I have a valid 'cookie.txt' that is getting written to, but it seems like it is missing content.
The cookie that curl writes through this request stored in 'cookie.txt' looks like this. (Redacted potentially sensitive information.)
#HttpOnly_.app.mobilecause.com TRUE / FALSE shortStringOfNumbers __cfduid longStringOfNumbers
The cookies generated by postman when executing the command through postman look like this. (Redacted potentially sensitive information.)
__cfruid=longStringOfNumbers-shortStringOfNumbers; path=/; domain=.app.mobilecause.com; HttpOnly; Expires=Tue, 19 Jan 2038 03:14:07 GMT;
__cfduid=longStringOfNumbers; path=/; domain=.app.mobilecause.com; HttpOnly; Expires=Thu, 23 Jan 2020 04:54:50 GMT;
Essentially it seems like the php request is missing the '__cfruid' cookie. Could this be the cause?
Copying this exact code into http://phpfiddle.org/ produces this same cloudflare landing page. Running this locally on my machine produces the expected result.

You're running into a Managed Challenge: https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/
The key question here is whether you own the zone. The site owner can add a managed challenge for pretty much any reason as part of their WAF: https://developers.cloudflare.com/waf/ . We could speculate about whether it's due to your traffic being deemed a bot or maybe they're blocking based on your user agent string. You don't have control over managed challenges that are served to you if you don't own the domain in Cloudflare.
If you are the site owner, you can determine which rule is causing this Managed Challenge by taking the Cloudflare rayID and filtering for it in Security > Overview. You can then add a bypass to your firewall rule to exclude this PHP curl traffic.

Related

PHP waiting for curl to finish before returning

I have two PHP files, one for "heavy lifting", one for quick responses that marshals the request to the heavy lifter so that the quick response file may respond to server request immediately (at least, that is the goal). The premise for this is the Slack Slash commands that prefer an instant 200 to let user know command is running.
<?php
echo("I want this text to reply to server instantly");
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$code = '200';
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "http://myheavyliftingfile.php",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => "datatobeusedbyheavylifter:data",
CURLOPT_HTTPHEADER => array(
"cache-control: no-cache",
"content-type: application/x-www-form-urlencoded",
"postman-token: 60757c65-a11e-e524-e909-4bfa3a2845fb"
),
));
$response = curl_exec($curl);
?>
What seems to be happening is, my response/echo doesn't get sent to Slack until my heavylifting.php curl finishes, even though I wish for my response to happen immediately, while the heavy-lifting process itself separately. How can I have one PHP file acknowledge the request, kick off another process on a different file, and respond without waiting for long process to finish?
Update
I do not wish to run multiple curls at once, I just wish to execute one curl but not wait for it to return in order to return a message to Slack to say I received the request. My curl sends data to my other php file that does the heavy lifting. If this is still the same issue as defined in the duplicate, feel free to flag it again and I won't reopen.
The reason this does not work is, that PHP curl calls are always synchronous and your timeout is set to 30 seconds, which far exceeds the max. 3 seconds that is allowed for Slash commands.
But there is a fix to make this work. You just need these small changes:
Set the curl timeout to a smaller value to ensure your first script is completing below the 3 second threshold, e.g. set CURLOPT_TIMEOUT_MS to 400, which defines a timeout of 400 ms.
Set CURLOPT_NOSIGNAL to 1 in your first script. This is required for the timeout to work in UNIX based systems.
Make sure to ignore timeout-errors (CURL ERROR 28) in your first script, since your curl should always return a timeout error.
Make sure your second script is not aborted by the forced timeout by adding this line: ignore_user_abort(true);
See also this answer for a full example.
P.S.: You to not need any buffer flushing for this approach.

PHP - cURL should I set 'AUTOREFERER' when following redirects?

TL;DR
Why should or shouldn't I set CURLOPT_AUTOREFERER => true in my cURL function (that follows a limited number of redirects)?
Long(er) Version
I have a pretty standard cURL function that return the headers for a given URL, following up-to 10 redirects...
const SINGLETIMEOUT = 8; // Seconds (is this too long?)
public static function getHeaders($url, $userAgent) {
// Initialize cURL object
$curl = curl_init($url);
// Set options
curl_setopt_array($curl, array(
CURLOPT_USERAGENT => $userAgent,
CURLOPT_HEADER => true,
CURLOPT_NOBODY => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 10,
CURLOPT_AUTOREFERER => true,
CURLOPT_TIMEOUT => SINGLETIMEOUT, // 5 seconds (safety!)
CURLOPT_CONNECTTIMEOUT => SINGLETIMEOUT
));
// Run it
curl_exec($curl);
// Get headers
$headers = curl_getinfo($curl);
// Close it
curl_close($curl);
return $headers;
}
The function getHeaders works great, exactly as expected. But so far in my testing, there is no difference in performance or results, whether I include CURLOPT_AUTOREFERER => true or not. There are plenty of references saying what CURLOPT_AUTOREFERER does, but beyond that I can't find anything going into more depth on that particular option.
Ok, so setting `` will
... automatically set the Referer: header field in HTTP requests where it follows a Location: redirect
So what? Why does this matter? Should I keep it in or toss it? Will it cause the results to be different for some URLs? Will some domains return erroneous headers, the same as when I send an empty user agent?
And on, and on...
Most of the examples I found to make this function did not include it - but they also didn't include many of the other options that I'm including.
Ok some basic information first: According to wikipedia:
The HTTP referer (originally a misspelling of referrer) is an HTTP header field that identifies the address of the webpage (i.e. the URI or IRI) that linked to the resource being requested. By checking the referrer, the new webpage can see where the request originated.
In the most common situation this means that when a user clicks a hyperlink in a web browser, the browser sends a request to the server holding the destination webpage. The request includes the referer field, which indicates the last page the user was on (the one where they clicked the link).
Referer logging is used to allow websites and web servers to identify where people are visiting them from, for promotional or statistical purposes.
However here's an important detail. This header is supplied by the client and the client can choose to supply it or can choose to not supply it. In addition if the client chooses to supply it then the client can supply any value it wants.
Because of this developers have learned to not really rely on the referrer value they get for anything other than statistics because of how easily it can be spoofed (you can actually set the referrer header yourself in the cURL call if you want instead of using CURLOPT_AUTOREFERER).
Therefore it's generally inconsequential to supply it when using crawlers or cURL. It's up to you if you want to let the remote site know where you came from. It should still work either way.
That being said it's not impossible for a site to present different results based on the referrer, for example I had seen a site that was checking on whether the referrer was Google in order to supply additional in-site search results, but this is the exception and not the rule and other than that the sites should always be usable anyway.

Error 400 'bad request' when trying to cURL RSS feed

I'm trying to scrape the below feed (with permission) via PHP cURL:
http://www.safc.com/Home/RSS Feeds/News%20Feed
Loads fine in a browser, but gives me a 400 'bad request' with cURL.
$ch = curl_init($uri); //http://www.safc.com/Home/RSS Feeds/News%20Feed
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_ENCODING => '',
CURLOPT_TIMEOUT => CURL_CONNECT_TIMEOUT,
CURLOPT_USERAGENT => CURL_USER_AGENT,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_FOLLOWLOCATION => true
));
$ret = curl_exec($ch);
Result is a 400; I know this from looking in curl_getinfo().
CURL_USER_AGENT is an arbitrary identifier as I realised some other feeds wouldn't spit out content unless this header was present. I have tried removing the headers one by one, and tried adding a few more, but that approach feels a bit needle/haystack.
Before I approach the owners of the site, does anyone know how I might resolve this?
use http://www.safc.com/home/rss%20feeds/news%20feed check different between "Home" and "home" there is 301 redirect when you use "Home".

XML-RPC failing to respond to POST requests via cURL in PHP

I'm having some issues with calling WordPress XML-RPC via cURL in PHP. It's a WordPress.com hosted blog, and the XML-RPC file is located at http://sunseekerblogbook.com/xmlrpc.php.
Starting yesterday (or at least, yesterday was when it was noticed), cURL has been failing with error #52: Empty reply from server.
The code snippet we're using is below:
$ch = curl_init('http://sunseekerblogbook.com/xmlrpc.php');
curl_setopt_array($ch, [
CURLOPT_HEADER => false,
CURLOPT_HTTPHEADER => [
'Content-Type: text/xml'
],
CURLOPT_POSTFIELDS => xmlrpc_encode_request('wp.getPosts', [
1,
WP_USERNAME,
WP_PASSWORD,
[
'number' => 15
]
]),
CURLOPT_RETURNTRANSFER => true
]);
$ret = curl_exec($ch);
$data = xmlrpc_decode($ret, 'UTF-8');
Using cURL directly however, everything returns exactly as expected:
$output = [];
exec('curl -d "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>wp.getPosts</methodName><params><param><value><int>1</int></value></param><param><value><string>' . WP_USERNAME . '</string></value></param><param><value><string>' . WP_PASSWORD . '</string></value></param><param><value><struct><member><name>number</name><value><int>15</int></value></member></struct></value></param></params></methodCall>" sunseekerblogbook.com/xmlrpc.php', $output);
$data = xmlrpc_decode(implode('', $output), 'UTF-8');
We've been successfully able to query WordPress since July 2013, and we're at a dead-end as to why this has happened. It doesn't look like PHP or cURL have been updated/changed recently on the server, but the first code snippet has failed on every server we've tried it on now (with PHP 5.4+).
Using the http://sunseekerblogbook.wordpress.com/xmlrpc.php link gives the same issue.
Is there anything missing from the PHP code that would cause this issue? That it's suddenly stopped working over 12 months down the line is what has flummoxed me.
Managed to fix it. Looking at the headers sent by cURL, the only differences were that the cURL command line uses Content-Type: application/x-www-form-urlencoded and that the user agent was set to User-Agent: curl/7.30.0.
The choice of content type didn't affect it, but setting a user agent sorted it! It seems WordPress.com (but not self-hosted WordPress.org sites running the latest v3.9.2) now requires a user agent for XML-RPC requests, though this hasn't been documented anywhere that I can find.

Scraping ASP.Net website with POST variables in PHP

For the past few days I have been trying to scrape a website but so far with no luck.
The situation is as following:
The website I am trying to scrape requires data from a form submitted previously. I have recognized the variables that are required by the web app and have investigated what HTTP headers are sent by the original web app.
Since I have pretty much zero knowledge in ASP.net, thought I'd just ask whether I am missing something here.
I have tried different methods (CURL, get contents and the Snoopy class), here's my code of the curl method:
<?php
$url = 'http://www.urltowebsite.com/Default.aspx';
$fields = array('__VIEWSTATE' => 'averylongvar',
'__EVENTVALIDATION' => 'anotherverylongvar',
'A few' => 'other variables');
$fields_string = http_build_query($fields);
$curl = curl_init($url);
curl_setopt_array
(
$curl,
array
(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_SSL_VERIFYPEER => 0, // Not supported in PHP
CURLOPT_SSL_VERIFYHOST => 0, // at this time.
CURLOPT_HTTPHEADER =>
array
(
'Content-type: application/x-www-form-urlencoded; charset=utf-8',
'Set-Cookie: ASP.NET_SessionId='.uniqid().'; path: /; HttpOnly'
),
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $fields_string,
CURLOPT_FOLLOWLOCATION => 1
)
);
$response = curl_exec($curl);
curl_close($curl);
echo $response;
?>
The following headers were requested:
Request URL:
http://www.urltowebsite.com/default.aspx
Request Method:POST
Status Code: 200 OK
Request Headers
Accept:application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
Content-Type:application/x-www-form-urlencoded
User-Agent:Mozilla/5.0 (Macintosh; U;
Intel Mac OS X 10_6_4; en-us)
AppleWebKit/533.18.1 (KHTML, like
Gecko) Version/5.0.2 Safari/533.18.5
Form Data
A lot of form fields
Response Headers
Cache-Control:private
Content-Length:30168
Content-Type:text/html; charset=utf-8
Date:Thu, 09 Sep 2010 17:22:29 GMT
Server:Microsoft-IIS/6.0
X-Aspnet-Version:2.0.50727
X-Powered-By:ASP.NET
When I investigate the headers of the CURL script that I wrote, somehow does not generate the Form data request. Neither is the request method set to POST. This is where it seems to me where things go wrong, but dunno.
Any help is appreciated!!!
EDIT: I forgot to mention that the result of the scraping is a custom session expired page of the remote website.
Since __VIEWSTATE and __EVENTVALIDATION are base 64 char arrays, I've used urlencode() for those fields:
$fields = array('__VIEWSTATE' => urlencode( $averylongvar ),
'__EVENTVALIDATION' => urlencode( $anotherverylongvar),
'A few' => 'other variables');
And worked fine for me.
Since VIEWSTATE contains the state of the page in a particular situation (and all this state is encoded into a big, apparently messy, string), you cannot be sure that the param you are scraping can be the same for your "mock" request (I'm quite sure that it cannot be the same ;) ).
If you really have to deal with VIEWSTATE and EVENTVALIDATION params my advice is to follow another approach, that is to scrape content via Selenium or with an HtmlUnit like library (but unfortunately I don't know if there's something similar in PHP).

Categories