So I am trying to query the following URL: http://mil.sagepub.com/content/17/2/227.short
Here's the situation: On a browser such as Chrome or Safari it will:
307 to https://mil.sagepub.com/content/17/2/227.short and then
301 to
https://journals.sagepub.com/doi/abs/10.1177/03058298880170020901
which returns 200
On cURL, it will:
307 to https://mil.sagepub.com/content/17/2/227.short
which returns 503
So naturally, I go to Chrome and copy the request to https://mil.sagepub.com/content/17/2/227.short as a bash cURL command. I paste it into bash, and I get a 503. I try copying the Safari request to the same page as a bash cURL command, and also a 503. So seemingly two cURL requests formatted to perfectly imitate the browser request returns a 503.
On my PHP cURL options, I try and experiment with different options, but it also only returns a 503. So I have 3 different OSs and PHP's cURL library getting 503 responses, while web browsers get a 200 OK response.
Here is the outgoing request my PHP code tried to send with cURL:
GET /content/17/2/227.short HTTP/2
Host: mil.sagepub.com
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
authority: mil.sagepub.com
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
upgrade-insecure-requests: 1
cache-control: max-age=0
connection: keep-alive
keep-alive: 300
accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
accept-language: en-US,en;q=0.9,de;q=0.8
dnt: 1
sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
The method that sets all of the curl options and generates the above request header is as below:
$url = "https://mil.sagepub.com/content/17/2/227.short"
$full = true
$tor = false
$httpVersion = CURL_HTTP_VERSION_2_0 // HTTP/1.1 doesn't seem to work in this page
$this->userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
$this->curlTimeoutFull = 60
protected function getCurlOptions( $url, $full = false, $tor = false, $httpVersion = CURL_HTTP_VERSION_NONE ) {
$requestType = $this->getRequestType( $url );
if ( $requestType == "MMS" ) {
$url = str_ireplace( "mms://", "rtsp://", $url );
}
$options = [
CURLOPT_URL => $url,
CURLOPT_HEADER => 1,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_AUTOREFERER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_TIMEOUT => $this->curlTimeoutNoBody,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_COOKIEJAR => sys_get_temp_dir() . "checkifdead.cookies.dat",
CURLOPT_HTTP_VERSION => $httpVersion,
CURLINFO_HEADER_OUT => 1
];
if ( $requestType == "RTSP" || $requestType == "MMS" ) {
$header = [];
$options[CURLOPT_USERAGENT] = $this->mediaAgent;
} else {
// Properly handle HTTP version
// Emulate a web browser request but make it accept more than a web browser
if ( in_array( $httpVersion, [CURL_HTTP_VERSION_1_0, CURL_HTTP_VERSION_1_1, CURL_HTTP_VERSION_NONE] ) ) {
$header = [
// #codingStandardsIgnoreStart Line exceeds 100 characters
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
// #codingStandardsIgnoreEnd
'Accept-Encoding: gzip, deflate, br',
'Upgrade-Insecure-Requests: 1',
'Cache-Control: max-age=0',
'Connection: keep-alive',
'Keep-Alive: 300',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language: en-US,en;q=0.9,de;q=0.8',
'Pragma: '
];
} elseif ( in_array( $httpVersion, [CURL_HTTP_VERSION_2, CURL_HTTP_VERSION_2_0, CURL_HTTP_VERSION_2_PRIOR_KNOWLEDGE, CURL_HTTP_VERSION_2TLS] ) ) {
$parsedURL = $this->parseURL( $url );
$header = [
'authority: ' . $parsedURL['host'],
//':method: get',
//':path: ' . $parsedURL['path'],
//':scheme: ' . strtolower( $parsedURL['scheme'] ),
// #codingStandardsIgnoreStart Line exceeds 100 characters
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
// #codingStandardsIgnoreEnd
'accept-encoding: gzip, deflate, br',
'upgrade-insecure-requests: 1',
'cache-control: max-age=0',
'connection: keep-alive',
'keep-alive: 300',
'accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'accept-language: en-US,en;q=0.9,de;q=0.8',
'dnt: 1'
];
if ( $requestType == "HTTPS" ) {
$header[] = 'sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"';
$header[] = 'sec-ch-ua-mobile: ?0';
$header[] = 'sec-ch-ua-platform: "' . $this->getRequestPlatform() . '"';
$header[] = 'sec-fetch-dest: document';
$header[] = 'sec-fetch-mode: navigate';
$header[] = 'sec-fetch-site: none';
$header[] = 'sec-fetch-user: ?1';
}
}
if ( $this->customUserAgent === false ) {
$options[CURLOPT_USERAGENT] = $this->userAgent;
} else {
$options[CURLOPT_USERAGENT] = $this->customUserAgent;
}
}
if ( $requestType == 'FTP' ) {
$options[CURLOPT_FTP_USE_EPRT] = 1;
$options[CURLOPT_FTP_USE_EPSV] = 1;
$options[CURLOPT_FTPSSLAUTH] = CURLFTPAUTH_DEFAULT;
$options[CURLOPT_FTP_FILEMETHOD] = CURLFTPMETHOD_SINGLECWD;
if ( $full ) {
// Set CURLOPT_USERPWD for anonymous FTP login
$options[CURLOPT_USERPWD] = "anonymous:anonymous#domain.com";
}
}
if ( $full ) {
// Extend timeout since we are requesting the full body
$options[CURLOPT_TIMEOUT] = $this->curlTimeoutFull;
$options[CURLOPT_HTTPHEADER] = $header;
if ( $requestType != "MMS" && $requestType != "RTSP" ) {
$options[CURLOPT_ENCODING] = 'gzip, deflate, br';
}
$options[CURLOPT_USERAGENT] = $this->userAgent;
} else {
$options[CURLOPT_NOBODY] = 1;
}
if ( $tor && self::$torEnabled ) {
$options[CURLOPT_PROXY] = self::$socks5Host . ":" . self::$socks5Port;
$options[CURLOPT_PROXYTYPE] = CURLPROXY_SOCKS5_HOSTNAME;
$options[CURLOPT_HTTPPROXYTUNNEL] = true;
} else {
$options[CURLOPT_PROXYTYPE] = CURLPROXY_HTTP;
}
return $options;
}
My question is, what am I missing here?
Unfortunately, this appears to be CloudFlare using TLS fingerprinting to distinguish cURL requests from actual browsers. There doesn't likely exist a means to work around this. Please correct me if I'm wrong here.
If the issue here actually is TLS fingerprinting then using an HTTP proxy like mitmproxy, HTTP Toolkit or NaïveProxy might be a possible workaround. For details see disscussion here and here. Another option could be to use curl-impersonate as pointed here.
UPDATE
Last night I had not time to look at the HTML. Now that I have, it's disappointing. I've taken a little more time to look at the HTML. It is the HTML for the Cloudflare "Checking Security" message.
I retrieved the response header. And it is a 503 response.
I do not have security certificates for my curl and don't really have the time do install the now.
This one is going to be a challenge. I wish I had more time now because I like a challenge. I may work on this in the near future (weeks/months).
If you are going to continue you may need some of these curl options.
Start with CURLOPT_CAINFO / CURLOPT_CAPATH
CURLINFO_SSL_VERIFYRESULT
CURLINFO_HTTPAUTH_AVAIL
CURLOPT_HTTPAUTH the HTTP authentication method(s) to use.
CURLINFO_SSL_ENGINES
CURLINFO_CERTINFO true to output SSL certification information to STDERR on secure transfers
CURLINFO_APPCONNECT_TIME
CURLINFO_REDIRECT_COUNT -
CURLINFO_REDIRECT_URL
CURLINFO_EFFECTIVE_URL
Options
CURLOPT_UNRESTRICTED_AUTH
CURLOPT_SSLVERSION Your best bet is to not set this and let it use the default
CURLOPT_SSL_OPTIONS
CURLOPT_KEYPASSWD
CURLOPT_PINNEDPUBLICKEY
CURLOPT_SSH_PRIVATE_KEYFILE
CURLOPT_CAPATH A directory that holds multiple CA certificates
CURLOPT_CAINFO The name of a file holding one or more certificates to verify the peer with.
CURLOPT_SSL_VERIFYHOST 0 to not check the names. 1 should not be used. 2 to verify that a Common Name field
CURLOPT_SSL_VERIFYPEER false to stop cURL from verifying the peer's certificate.
CURLOPT_SSL_VERIFYSTATUS true to verify the certificate's status.
CURLOPT_CERTINFO true to output SSL certification information to STDERR on secure transfers.
CURLOPT_FRESH_CONNECT true to force the use of a new connection instead of a cached one.
CURLOPT_SSL_ENABLE_NPN
CURLOPT_SSL_ENABLE_ALPN
CURLOPT_SSL_FALSESTART true to enable TLS false start
CURLOPT_SSH_AUTH_TYPES
CURLOPT_LOGIN_OPTIONS
CURLOPT_SSL_CIPHER_LIST
CURLOPT_SSLCERTPASSWD
CURLOPT_SSLKEY
CURLOPT_SSLENGINE
CURLOPT_SSLKEYPASSWD
CURLOPT_SSLKEYTYPE
CURLOPT_SSL_CIPHER_LIST
CURLOPT_SSH_PRIVATE_KEYFILE
These are the SSL security error codes.
CURLE_SSL_CONNECT_ERROR (35)
CURLE_SSL_ENGINE_NOTFOUND (53)
CURLE_SSL_ENGINE_SETFAILED (54)
CURLE_SSL_CERTPROBLEM (58)
CURLE_SSL_CIPHER (59)
CURLE_PEER_FAILED_VERIFICATION (60)
CURLE_SSL_ENGINE_INITFAILED (66)
CURLE_LOGIN_DENIED (67)
CURLE_SSL_CACERT_BADFILE (77)
CURLE_SSH (79)
CURLE_SSL_CRL_BADFILE (82)
CURLE_SSL_ISSUER_ERROR (83)
CURLE_SSL_PINNEDPUBKEYNOTMATCH (90)
CURLE_SSL_INVALIDCERTSTATUS (91)
CURLE_AUTH_ERROR (94)
CURLE_SSL_CLIENTCERT (98)
Response Headers
HTTP/2 503
date: Fri, 07 Oct 2022 23:36:20 GMT
content-type: text/html; charset=UTF-8
x-frame-options: SAMEORIGIN
referer-policy: same-origin
cross-origin-embedder-policy: require-corp
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: same-origin
permissions-policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
expires: Thu, 01 Jan 1970 00:00:01 GMT
set-cookie: __cf_bm=ephtbgHn7LX95.n_3NjMptBv8PFRRJP_6xtBw_9Ci0A-1665185780-0-ATjw+AOJfmTzUGIfBhlg8p6scov6AoznbSauiS3ofTN6KYEebpn4p+k3lU+5l6zZRwENm0kYDHa9zptcfePujs8=; path=/; expires=Sat, 08-Oct-22 00:06:20 GMT; domain=.sagepub.com; HttpOnly; Secure; SameSite=None
strict-transport-security: max-age=15552000
server: cloudflare
cf-ray: 756a755a18e95e68-TPA
End of Update
I tried the curl below and it appeared to work.
When I used this header it did not work.
header("Content-Type: text/html; UTF-8");
Notice the header() below.
<?php
header("Content-Type: text/plain; UTF-8");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://journals.sagepub.com/doi/abs/10.1177/03058298880170020901');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5',
'Accept-Encoding' => 'gzip, deflate, br',
'DNT' => '1',
'Connection' => 'keep-alive',
'Upgrade-Insecure-Requests' => '1',
'Sec-Fetch-Dest' => 'document',
'Sec-Fetch-Mode' => 'navigate',
'Sec-Fetch-Site' => 'cross-site',
'Pragma' => 'no-cache',
'Cache-Control' => 'no-cache',
]);
curl_setopt($ch, CURLOPT_COOKIE, 'JSESSIONID=aaaf4hNAtZPHz_3k0KSoy; SERVER=lZreQKWihabND4/rEplMDWnYKk2egCfX; __cf_bm=PzjJU2qBrjwilfLqClzWC.zRk49hsY6b7e4F8WtMiMA-1665105299-0-AYf2x4A2SmmdyIQGUHAY0jdkGZtI3qyt3W48WENOL6tGZLEYJ/IqcD5GAWn10V5J+khYliOD7yhKrtVlGXwpawI=; MAID=cqVoyumi6JY4MYyiVyj//w==; MACHINE_LAST_SEEN=2022-10-06T18%3A15%3A00.400-07%3A00; usprivacy=1Y--');
$response = curl_exec($ch);
echo $response;
This is the response
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Just a moment...</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex,nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link href="/cdn-cgi/styles/challenges.css" rel="stylesheet" />
<meta http-equiv="refresh" content="35">
</head>
<body class="no-js">
<div class="main-wrapper" role="main">
<div class="main-content">
<h1 class="zone-name-title h1">
<img class="heading-favicon" src="/favicon.ico"
onerror="this.onerror=null;this.parentNode.removeChild(this)" />
journals.sagepub.com
</h1>
<h2 class="h2" id="challenge-running">
Checking if the site connection is secure
</h2>
<noscript>
<div id="challenge-error-title">
<div class="h2">
<span class="icon-wrapper">
<div class="heading-icon warning-icon"></div>
</span>
<span id="challenge-error-text">
Enable JavaScript and cookies to continue
</span>
</div>
</div>
</noscript>
<div id="trk_jschal_js" style="display:none;background-image:url('/cdn-cgi/images/trace/jsch/nojs/transparent.gif?ray=7562d7b24c405e68')"></div>
<div id="challenge-body-text" class="core-msg spacer">
journals.sagepub.com needs to review the security of your connection before proceeding.
</div>
<form id="challenge-form" action="/doi/abs/10.1177/03058298880170020901?__cf_chl_f_tk=2isUlGhlWW3V2j9Kjj8DwMFDI2e5zBKVHQMe3nobTys-1665105922-0-gaNycGzNCOU" method="POST" enctype="application/x-www-form-urlencoded">
<input type="hidden" name="md" value="PQaRto6SpdNW5tB2CzRq9tnhClVf9vWYghzKuRjDWOQ-1665105922-0-AeraO7jbF6VbQcW0RvUhPT1qfbCcUTYiREg7FWQynA4OVtiTbls-NlCSXpbPLK_-bVRyI6_Wphk1k1XY_lxg0MVXMZ2tT5RmeYGnJYD5T-de61VfREEcZtpE5oaFnFmWVY38q9TYjq1CllDiszTadCdO-j4yvW51pLjc-LT8Z4OIHRgA1TG92oykPVkBoKgUcuUVGpZhsklb5S7Mz5h31OlmT8X_oMpHigkNwFWCVnuxl5_6FePRATc-hGFOYs6YFMu6P2GlAOigwfjRBmYi1ftmoQqBXK7wmb9HHmzrz1_BurTIQMFZyL-GExcvzgMud539jGN1mhQMiEfA44cezwbDBCp_TZLvF0l4_jx6ON49_M1aKzD3uM0Y7AO7WY7RnoED4ljPYcBvOQJyIF0QY_LLCZPoApA2pP4ku8J3uHa1ziS_fL6vOJD-cEkHMKMEFWMoOYCNYVQg1z_x0k7Pj2O_7svXqMyqk990LloFkmezlHPnw7k58EE8A8-KZavGjKAZnVKI4nZ5Kgr1ng8IDwsk64EiblwOTmvQoGBDObBxNI5QKmHY1of7zyaSlv233SzrJpn1JMP8o1gqtIVkQKTsV5cz8RQ59Zdrfc30zJhtc0R4REKiX7KaQ6hfmoZQLkuocj_TIF1cFYcF_UD_sl1gqV4pKejaZXtYc-zScMFCQBBPlgRIN3z2QD47WFCBR8-AQWgq2mkxE8GvbfwgoBw" />
<input type="hidden" name="r" value="4ga_OQoCt0ZP_Qytdr9ME00oPYmCHboqWnNiiVPIL4o-1665105922-0-AQ5V/dQHa2NaMlQDJ3v4D9IAB2MuFm30IpZKei7Z4tI9fCuFXT0VWdD+l9FshGYx5YBz5u86Ym5/ZKwJaw3A93MqGZzKysYqp2BUPENx8Tbm0XGZpI0s6SBb+AgVwnSbe0uKK28RBHYsTCTA2JX0Op5rWLvJT0o7G0UBkwBjoBwBYUGB4zcrPpGUBFZW4izVLm4kSdkkpKs7/n3/1nvIIfmEWBCm1Z6IoLhPFIan4aKMhymO96qSKt25TAhtBzNt/dv/r9iTb06EGlsF3r2CEUmlDP1VBjo9jYqIEiR9/5tiifCQ52JKHJZmOLuH0lqwicWNISfCvhrzMtdWKpUW3hSCnSNlSIr6x0H+ny4NFK8hg0y/b1Y4oFsqG9h4mXK9QDojLMiK4kK6/yz1a3snUe58xad4x8McL6JlqHFFUWzWxrs1IfQpUFBUSYnD12h0h00QyErge5qIb1ufZpaz/M+hiODNQhsj0NQPsWjrZ6RW0RAElJhgWn3/gLa/+BEuWxQc/raT6pxjNiMZLPAYV/H5B7q5hp2LBFBBaK4rH3QHxxunPlyrYAyPI9ke7Hqhwceliv31viTrVOuHeqkpw1zenp/6gz6N4ZdQqEMY/G2/5/cXsUSIPaAoB2TNYWcHJchtCJ8ANeikiNw7Wopo8CB54D48zrkYtu9Me3FMVBuPMNonvZy/uHnPGqqd67yYcR4ZqN0ceol85dYitQbExdz3Atr7civtdDD5Peu9ToRRpM+Jq1cKt9m8cLuNldf8WEIR7il1ugLhV4f8Q2xG/AuB1S2WoDwx5RqrUf2XWPdo3DVuk/W1+5wN/TWp0wMiiwk1xYk5LwReuP+YilaIkkPnJIAEUCpkR+/e0pHS/ceUu4Khx7X1Fw/WTTITU+bJ+3cUmvV9Vc1IEluULfeuBWylZ3fsW7qfeBIaJA6/1KtJFXPD7ggh0dees5jQrk0t2k0BkbCGcHsKS1PKL86BM5IzQAs5jk9bzxJC13J91iCB09kj7+gHnbvSniAw+luNAFZEQ1Ipt239rMWmSeSrDrQA/TyuETkl5WLvLY6LOYgmb6JQDRLtacWeqPt7U45e5OsCpUrdWcwTQR1K1yHh/DY8e9627S7ZhWpBcOPbfsvO9JOkTJqn2am/TWa2V0VbSNSHvdW2WmNnz8yLXTlo5L+QzGFTf06ywJV7luXHaTRAskV6qsaTl/5oYPmG+oH5sffEo8jr+xZ6iRYbgDMAn1Bo3O3IgFKidTpGcQ7GBfkBcvP2NUpqVq41rcHnmgyDLrru1Zi5ZTFir4iw7JMOYw9YNNw0Qy4iCLiFdFpuQaBjJC0faS890a+jjnsQvf9el/aDgrXkrGWlgvUn4EITE5Jc5b2XIm15DKCOqLJEdojUAgKdoxrQxoYYxsLVlbN8/eRB8Cuxus/BGH3Kxhbf3w8moEnyrvh+ZjzuSiEaYZYJx7I1mTJbLSdUyp74w5exEyX7U3h+1PCT4nd2tY5cS5teMVxmu+Q3Q8UFe7fsN49eMaGh1ggLScCH0zZtnt5Oms6lehrg8WfFdiaWWEvLTLEExO61IsGIWnWxcbh/cY1qeiD5E2W3RYei+XTsRb5kwFKilD248c159QJCepcGM0YZL7Ax6o0Bw4z5iDc+P5tJZN1qBIful2NJ40FF4jM+oDwf+8AzsxNilVoubBZvpB3qunvjMavbw3GoBPIDGi8gn19JznYK1Ccq+p/OAysjqpONhwB5vihBskpp6c5UZMIMXxGkfxGfKBzhO+JE/hvPvT9d4e/QpjOIBBT9tEIaSfatD1SBhVOR/n+qTCm0cEzJSfby6juHsIGi5lCBgIcngT5zx1bIFcDlxTBcFerq1W4sB61W0XFBgfj8z4uymavOlU5wnm8+y7yA0cHtZBfH08yqhg4zTKrM4iXzdfqmkqaYbxMT1x8unIezyqYCSvprrsYWmgNBlaDjDwvgAGGR9EPVg8LNDd9vA7F+zwNrVuHlwMB4XBRGoEUFd/3LAr5agzDY5fqqAVd3p2T6gHAVBOcaTPZHzNMuOdCQucaWq/keg9x3raSCFQcEXRMaS63uMB7tkScU+V8AAMooLVY+5Cc801G6pHfT8g/O4Ykd7hpQIFysIzRzYn4tzZzR/kXPLNZw+ISkbArswMfw12ICoChRWiWkoe8GB/bCUuplxv0K4GDaxZLzcrrKYyo73g4+EAMCJl3LapBmso6dEo1J"/>
</form>
</div>
</div>
<script>
(function(){
window._cf_chl_opt={
cvId: '2',
cType: 'non-interactive',
cNounce: '61335',
cRay: '7562d7b24c405e68',
cHash: '10218f759e4ab19',
cUPMDTk: "\/doi\/abs\/10.1177\/03058298880170020901?__cf_chl_tk=2isUlGhlWW3V2j9Kjj8DwMFDI2e5zBKVHQMe3nobTys-1665105922-0-gaNycGzNCOU",
cFPWv: 'b',
cTTimeMs: '1000',
cTplV: 4,
cTplB: 'cf',
cRq: {
ru: 'aHR0cHM6Ly9qb3VybmFscy5zYWdlcHViLmNvbS9kb2kvYWJzLzEwLjExNzcvMDMwNTgyOTg4ODAxNzAwMjA5MDE=',
ra: '',
rm: 'R0VU',
d: 'hUB8qW1/Rbwx33YUzcVMTqUu+pyyMH1g4rZFyeT71nouHfNa/kZzFRdfCTOxboH8qp84VRepH5UjSKXMIbGKvMYDMxYCMhms0yymP/QQVIWQyGHD19wnRWegLpiSX4mWkM/LS30eF/16qC+eEte4h6V1m/FL2qCQXsFqA9bhdq8v5IZ8soop1L7Mpzr6cCI/7rE4kJrxYtSBsrAdBN+zf9uZQpKszC1++hNaGtkek7bSSe5Ouuq6ilnV1PY+uC5bpilAe4B7rDxtBabC6JKwQ79AobM0TqDR99geXWC/ratOOAJHek7aV0bq3wdLJywqaCRNQ731sg/oLws049U9s9xh93wz8hlpAXqZQ7v/hItjco/USW4JsmwHidUFLWBM1d7HSLLVFdGSw7rAs9QG0mRv22jNCwH2A2PPlW0vQJKZ8VBwJjbL6a3oPXkDsp7l9UhV7a7Fu2gP/+vErmH7JsbWj+4thWgCSGhCwRiUuX1R1yvoMM+PE3WbV1Mg3UZeshibgBXZXzoxN+KEk9QXa9mL2FBqEc0m5k73uNgvl5PlpDCQLkXW1edTJFpwgxUqoNl/Cv2DbT8o9gyhxVcvmg==',
t: 'MTY2NTEwNTkyMi45NDIwMDA=',
m: 'uhMFR0Zz2KvO1IklAUNSaC4Dp7dCmCc5//+PPZeITRY=',
i1: '00a08NzJ+5CkmeYGE9ULuA==',
i2: '05tMwHIIyy7WJUV2JWbeXA==',
zh: 'gw7YMdbZ1M4iQ6cbqLPC730Ml6kaQ+3i4OTRjaElasU=',
uh: 'DV4j3Tmrbi5Rs1q3ahwVS6SgbPbI7np5884QO1u1Cgg=',
hh: 'c9ogzZPyf3xtUVOiYSAQbEsbym/d5b1rPQM2Rm/OUTE=',
}
};
var trkjs = document.createElement('img');
trkjs.setAttribute('src', '/cdn-cgi/images/trace/jsch/js/transparent.gif?ray=7562d7b24c405e68');
trkjs.setAttribute('style', 'display: none');
document.body.appendChild(trkjs);
var cpo = document.createElement('script');
cpo.src = '/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=7562d7b24c405e68';
window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, -window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;
if (window.history && window.history.replaceState) {
var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
history.replaceState(null, null, "\/doi\/abs\/10.1177\/03058298880170020901?__cf_chl_rt_tk=2isUlGhlWW3V2j9Kjj8DwMFDI2e5zBKVHQMe3nobTys-1665105922-0-gaNycGzNCOU" + window._cf_chl_opt.cOgUHash);
cpo.onload = function() {
history.replaceState(null, null, ogU);
};
}
document.getElementsByTagName('head')[0].appendChild(cpo);
}());
</script>
<div class="footer" role="contentinfo">
<div class="footer-inner">
<div class="clearfix diagnostic-wrapper">
<div class="ray-id">Ray ID: <code>7562d7b24c405e68</code></div>
</div>
<div class="text-center">Performance & security by <a rel="noopener noreferrer" href="https://www.cloudflare.com?utm_source=challenge&utm_campaign=j" target="_blank">Cloudflare</a></div>
</div>
</div>
</body>
</html>
Using header("Content-Type: text/html; UTF-8"); I got this:
I need to login to http://auto.vsk.ru/login.aspx making a post request to it from my site.
I wrote a js ajax function that sends post request to php script on my server, that sends cross-domain request via cUrl.
post.php
<?php
function request($url,$post, $cook)
{
$ch = curl_init();
$curlConfig = array(
CURLOPT_URL => $url,
CURLOPT_POST => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_COOKIEFILE => $cook,
CURLOPT_COOKIEJAR => $cook,
CURLOPT_USERAGENT => '"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Trident/7.0; Touch; .NET4.0C; .NET4.0E; Tablet PC 2.0)"',
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_REFERER => $url,
CURLOPT_POSTFIELDS => $post,
CURLOPT_HEADER => 1,
);
curl_setopt_array($ch,$curlConfig);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$result = request($_POST['url'], $_POST['data'], $_POST['cook']);
if ($result === FALSE)
echo('error');
else
echo($result);
?>
Js code:
function postcross(path,data,cook,run)
{
requestsp('post.php','url='+path+'&data='+data+'&cook='+cook, run);
}
function requestp(path, data, run)
{
var http = new XMLHttpRequest();
http.open('POST', path, true);
http.setRequestHeader('Content-type', 'application/x-www-form-urlencoded');
http.onreadystatechange = function()
{
if(http.readyState == 4 && http.status == 200)
{
run(http);
}
}
http.send(data);
}
postcross('http://auto.vsk.ru/login.aspx',encodeURIComponent('loginandpassord'),'vskcookies.txt',function(e){
document.getElementById('container').innerText=e.responseText;
});
The html page I getting from response says two things:
My browser is not Internet Explorer, I should switch to it.(actually it works from Google Chrome, at least can login).
My browser doesn’t support cookies.
About the cookies it is very similar to this (veeeery long) question. File vskcookies.txt is created in my server and it is actually updates after post request call, and stores cookies.
About the IE, firstly I thought that the site checks browser from js, but it is wrong, because js doesn’t run at all - I only read html page as a plain text, and it already has that notification about IE.
So wondered what if I make cUrl request wrong? I wrote new php script that shows request headers, here is a source:
head.php
<?php
foreach (getallheaders() as $name => $value)
{
echo "$name: $value\n";
}
?>
The result of postcross('http://mysite/head.php',encodeURIComponent('loginandpassord'),'vskcookies.txt',function(e){ document.getElementById('container').innerText=e.responseText; }):
Host: my site
User-Agent: "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Trident/7.0; Touch; .NET4.0C; .NET4.0E; Tablet PC 2.0)"
Accept: */*
Content-Type: application/x-www-form-urlencoded
Referer: mysite/head
X-1gb-Client-Ip: my ip
X-Forwarded-For: ip, ip, ip
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Port: 443
Accept-Encoding: gzip
X-Forwarded-URI: /head
X-Forwarded-Request: POST /head HTTP/1.1
X-Forwarded-Host: my site
X-Forwarded-Server: my site
Content-Length: 823
Connection: close
For some reason there is no Cookie: parameters, but user agent is IE as I mentioned.
Also I tried to replace head.php source with
print_r($_COOKIE);
And got empty array:
Am I doing something wrong, or it is site bot-protection?
Update 1
It is showing cookies only if to pass them through CURLOPT_COOKIE.
So I think I will leave CURLOPT_COOKIEFILE => $cook; as it is, and for CURLOPT_COOKIE something like file_get_contents($cook), although there is useless information. protection?
Important Update 2
Okay, probably I just stupid. Response html page indeed consists messages about IE and offed cookies, but they are in div that is display:none and are displayed on by js.
So, seems my tries fail because of another reasons.
When I try to implement auto-complete using the code below :
$('#keyword').autocomplete({
source : '/Dev/pages/search.php',
minLength : 3,
type : 'POST',
select: function( event, ui )
{
$(this).data("autocomplete").menu.element.addClass("yellow");
}
})
.data( "ui-autocomplete" )._renderItem = function( ul, item )
{
console.log(item);
return $( "<li>" )
.append( "<a>" + add3Dots(item.name,20) + "</a>" )
.appendTo( ul );
};
if (isset($_POST["term"])){
$term = trim($_GET['term']);
$parts = explode(' ', $term);
$p = count($parts);
$a_json = array();
$a_json_row = array();
$search = connexion::bdd_test();
$requete = "SELECT name from BDD_TEST.companies";
for($i = 0; $i < $p; $i++) {
$requete .= ' WHERE name LIKE ' . "'%" . $conn->real_escape_string($parts[$i]) . "%'";
}
$result = $search->query($requete);
while($donnees = $result->fetch(PDO::FETCH_ASSOC)) {
$a_json_row["name"] = $data['name'];
array_push($a_json, $a_json_row);
}
}
else
{
$a_json['call']=false;
$a_json['message']="Problem to collect word.";
}
$json = json_encode($a_json);
print_r($json);
When I test, if condition is not satisfied and I get the message directly from else " Problem to collect word . "
It means that $_POST["term"] is not defined.
How can I retrieve the input value ?
To be sure that values have been send, you can see what headers the browser sent to the web server with PHP for testing purposes.
This is possible using the apache_request_headers() function but it only works if PHP is run on Apache as a module.
How using apache_request_headers() :
If PHP is run on Apache as a module then the headers the browser send can be retrieved using the apache_request_headers() function. The following example code uses print_r to output the value from this function call:
print_r(apache_request_headers());
The output from the above using an example request from Google Chrome would output something similar to the following:
Array
(
[Host] => www.testing.local
[Connection] => keep-alive
[User-Agent] => Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/4.0.206.1 Safari/532.0
[Cache-Control] => max-age=0
[Accept] => application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
[Accept-Encoding] => gzip,deflate,sdch
[Accept-Language] => en-US,en;q=0.8
[Accept-Charset] => ISO-8859-1,utf-8;q=0.7,*;q=0.3
)
Alternative when PHP is run as a CGI :
If PHP is not being run as a module on Apache, the browser headers should be stored in the $SERVER array with the key being the request header name converted to upper case, hypens replaced with underscores, and prefixed with HTTP
The same request above showing the relevent lines from $_SERVER are as follows:
[HTTP_HOST] => www.testing.local
[HTTP_CONNECTION] => keep-alive
[HTTP_USER_AGENT] => Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/4.0.206.1 Safari/532.0
[HTTP_CACHE_CONTROL] => max-age=0
[HTTP_ACCEPT] => application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
[HTTP_ACCEPT_ENCODING] => gzip,deflate,sdch
[HTTP_ACCEPT_LANGUAGE] => en-US,en;q=0.8
[HTTP_ACCEPT_CHARSET] => ISO-8859-1,utf-8;q=0.7,*;q=0.3
The alternative method is create our own function if the apache_request_headers() function does not exist, which extracts just the values from $_SERVER and converts the key names to the same style as apache_request_headers(). This works like so:
if(!function_exists('apache_request_headers')) {
function apache_request_headers() {
$headers = array();
foreach($_SERVER as $key => $value) {
if(substr($key, 0, 5) == 'HTTP_') {
$headers[str_replace(' ', '-', ucwords(str_replace('_', ' ', strtolower(substr($key, 5)))))] = $value;
}
}
return $headers;
}
}
The new function is only declare if the function with that name does not already exist. The end result is that whether or not the internal PHP function exists, you will be able to call a function with this name in your code.
A loop is done though the $SERVER array and any whose key starts with HTTP is added to the array, and the key is translated via a series of function calls to be in the same format as returned by apache_request_headers().
View HTTP headers in Google Chrome
Chrome has a tab "Network" with several items and when I click on them I can see the headers on the right in a tab.
Press F12 on windows or ⌥⌘I on a mac to bring up the Chrome developer tools.
Try to retrieve value(s) without knowing HTTP methods
You can detect which request type was used (GET, POST, PUT or DELETE) in PHP by using
$_SERVER['REQUEST_METHOD']
For more details please see the documentation for the $_SERVER variable.
Or you can retrieve value(s) using $_REQUEST['you_variable'].
Note $_REQUEST is a different variable than $_GET and $_POST, it is treated as such in PHP -- modifying $_GET or $_POST elements at runtime will not affect the elements in $_REQUEST, nor vice versa.
I am trying to make many requests to my website, using proxies and headers in PHP, and grab a proxy line by line from a text file to use in the file_get_contents, however I have 3 proxies in the text file (one per line) and the script is only using one, then ending. (I am executing it from command line)
<?php
$proxies = explode("\r\n", file_get_contents("proxies.txt"));
foreach($proxies as $cpr0xy) {
$aContext = array(
'http' => array(
'proxy' => "tcp://$cpr0xy",
'request_fulluri' => true,
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36\r\n"
), );
$rqcon = stream_context_create($aContext);
$destc = file_get_contents("http://domain.com/file.php", False, $rqcon);
echo $destc;
} ?>
Right now its only using the first proxy and it is returning the value correctly, however then the script stops. My goal is for it to endlessly make requests until it runs out of proxies in proxies.txt
This should work for you:
$proxies = explode(PHP_EOL, file_get_contents("proxies.txt"));
I want to call an URL and want to get the result with PHP by using file_get_contents (I know CURL, but first I want to try it with file_get_contents). In my case it's a request to the magento shop system, which requires a previously done login to the backend.
If I execute the URL manually in my browser, the right page is coming. If I send the URL with file_get_contents, I will also get logged in (because I added the Cookie to the request), but everytime I get only the dashboard home site, maybe something causes a redirect.
I tried to simulate the same http request, as my browser send it away. My question is: Is there a possiblity to send the same header data (Cookie, Session-ID etc.) directly as parameter to file_get_contents without manual serialization?
It's a common PHP question, the basic script would be:
$postdata = http_build_query(
array(
'var1' => 'some content',
'var2' => 'doh'
)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $postdata
)
);
$context = stream_context_create($opts);
$result = file_get_contents('http://example.com/submit.php', false, $context);
And in my case the code is:
$postdata = http_build_query(
array
(
'selected_products' => 'some content',
)
);
$opts = array('http' =>
array
(
'method' => 'POST',
'header' => "Content-type: application/x-www-form-urlencoded; charset=UTF-8\r\n".
"Cookie: __utma=".Mage::getModel('core/cookie')->get("__utma").";".
"__utmz=".Mage::getModel('core/cookie')->get("__utmz").
" __utmc=".Mage::getModel('core/cookie')->get("__utmc").';'.
"adminhtml=".Mage::getModel('core/cookie')->get("adminhtml")."\r\n".
"X-Requested-With: XMLHttpRequest\r\n".
"Connection: keep-alive\r\n".
"Accept: text/javascript, text/html, application/xml, text/xml, */*\r\n".
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0",
'content' => $postdata
)
);
$context = stream_context_create($opts);
var_dump(file_get_contents($runStopAndRemoveProducts, false, $context ));
The result should be the same error message I'll get in the browser by calling the URL manually ("please select some products" as plain text), but the response is a full dashboard home page as html website.
I'm looking for a script like this. I want to make sure all parameters are set automatically without manual build the cookie string and the other ones :)
file_get_contents('http://example.com/submit.php', false, $_SESSION["Current_Header"]);
EDIT: I've found the mistake, two special get-Parameter (isAjax=1 and form_key = Mage::getSingleton('core/session', array('name' => 'adminhtml'))->getFormKey()) are required. In my case the form_key causes the error. But the ugly Cookie string is already there - still looking for a more pretty solution.
To me this looks like you are trying to write a hack for something that you can do more elegantly, the proper, fully documented way. Please have a look at the Magento API.
If you want to delete products (or do anything else):
http://www.magentocommerce.com/api/soap/catalog/catalogProduct/catalog_product.delete.html
You will get a proper response back to know if things have been successful. If there are things the API cannot do then you can extend/hack it if you wish.
To get started you will need an API user/pass and get up to speed with SOAP. The examples in the Magento documentation should suffice. Good luck!