Authenticating to webservice with Kerberos from PHP in IIS - php

I am writing a PHP webapplication that has to connect to a webservice using Kerberos 5 authentication (Active Directory). My PHP website is hosted on IIS 7.5 with PHP 5.5. The application pool is running under the account that is authorized in Active Directory and for the target webservice.
I tried every example code that I could find on this site and other sites but to no avail.
This is the PHP code I am using now:
$url = 'http://mywebservice/login/kerberos';
$ch = curl_init();
$options = [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_VERBOSE => true,
CURLOPT_HTTPAUTH => CURLAUTH_GSSNEGOTIATE,
CURLOPT_HTTPHEADER => ['Authorization: Negotiate'],
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERPWD => 'myuser',
CURLOPT_URL => $url,
CURLOPT_HEADER => 1
];
curl_setopt_array( $ch, $options);
$result = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$header = substr($result, 0, $header_size);
$body = substr($result, $header_size);
print $result;
This gives me the following message:
HTTP/1.1 302 Found Date: Fri, 21 Oct 2016 14:49:15 GMT X-Robots-Tag: noindex,nofollow WWW-Authenticate: Location: http://mywebservice/login?login_fail Content-Length: 0
When I remove the CURLOPT_HTTPHEADER => ['Authorization: Negotiate'] l get an Internal Server error from the curl module.
When I use curl commandLine I get the following result:
curl --negotiate http://mywebservice/login/kerberos -umyuser#mydomain --verbose -c "c:\cookie.txt" -b "c:\cookie.txt"
Enter host password for user 'myuser#mydomain':
* Trying (192.168.1.1...
* Connected to mywebservice (192.168.1.1) port 80 (#0)
> GET /login/kerberos HTTP/1.1
> User-Agent: curl/7.41.0
> Host: mywebservice
> Accept: */*
> Cookie: JSESSIONID_PUBLIC=X(MASKED)XXXXXXXXXXXXXXXXX
>
< HTTP/1.1 401 Unauthorized
< Date: Fri, 21 Oct 2016 14:52:46 GMT
< X-Robots-Tag: noindex,nofollow
< WWW-Authenticate: Negotiate
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Last-Modified: Thu, 20 Oct 2016 14:52:46 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< P3P: CP=CAO PSA OUR
< Content-Type: text/html; charset=UTF-8
< Content-Length: 3643
<
* Ignoring the response-body
* Connection #0 to host mywebservice left intact
* Issue another request to this URL: 'http://mywebservice/login/kerberos'
* Found bundle for host mywebservice: 0xXXXXXXX
* Re-using existing connection! (#0) with host mywebservice
* Connected to mywebservice (192.168.1.1) port 80 (#0)
* Server auth using Negotiate with user 'myuser#mydomain'
> GET /login/kerberos HTTP/1.1
> Authorization: Negotiate X(MASKED)XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXD
w==
> User-Agent: curl/7.41.0
> Host: mywebservice
> Accept: */*
> Cookie: JSESSIONID_PUBLIC=X(MASKED)XXXXXXXXXXXXXXXXX
>
< HTTP/1.1 302 Found
< Date: Fri, 21 Oct 2016 14:52:46 GMT
< X-Robots-Tag: noindex,nofollow
< WWW-Authenticate:
< Location: http://mywebservice/?login_fail
< Content-Length: 0
<
* Connection #0 to host mywebservice left intact
When I test with the KerberosAuthenticationTester tool (http://blog.michelbarneveld.nl/michel/archive/2009/12/05/kerberos-authentication-tester.aspx) it authenticates me right away when I pass the url and credentials.
I assume that it is not working because I am missing the krb5 library. I could not find it as a DLL so I tried recompiling it with the PHP source in Visual Studio. This is not working for me as well, I am missing the config.w32 file. If necessary I can elaborate on that but first I want to know if this is really needed.
I also installed MIT Kerberos but this did not help aswell.
Is it correct that I need the krb5 DLL, or am I on the wrong track? If I need this DLL, where can I get it or how can I compile it? If there is another solution I would be very happy to hear it.
Thanks everyone for taking your time for me and replying!

Related

Curl request working on localhost but not online

when i am sending request on localhost using curl and php it got success as given below
This request is on localhost and get the output which is required to me
$url = "https://live12p.hotstar.com/hls/live/2024725/ipl2021/hin/1540008470/15mindvrm01c6817544da534447ba5b5f3760666fd923september2021/master_7.m3u8";
$referer = "https://www.hotstar.com/";
$origin = "https://www.hotstar.com";
$host = "live12p.hotstar.com";
$headers = array();
$headers[] = 'Host: ' . $host;
$headers[] = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0';
$headers[] = 'Cookie: hs_uid=5221b879-8857-496d-80bd-691646c0fcae; ajs_anonymous_id=%22e1cac957-8325-4cc5-a4c7-04c9322a50b5%22; ajs_user_id=%22971450d482ae40088b5d834ff952f60b%22; ajs_group_id=null; hdntl=exp=1632448848~acl=*ipl2021*~id=5267bf12e30107702f21c2ea2bf8b874~data=ip%3dwzSX5TdVuh1sa432PD6kIOuXAlfHJY32Vve29D3csZOD8xO2AjRrZpV-userid%3d8kP6OEf3LRYFUhAWTlF2R7ooxuElWlYpTzzEAosrFQCW-did%3dYfixYiH5EvZpALjGAYAzylXejGbgnbBi0BBLUmB93Jgj3HHJzJjbH16-~hmac=3508bd69101ca28bef6b9bb4ff4fb833c344404e3d62968cb38bcabb1d756a71';
$headers[] = 'Referer: ' . $referer;
$headers[] = 'Origin: ' . $origin;
//
$response = get_web_page($url);
echo $response;
function get_web_page($url)
{
global $referer;
global $headers;
$ch = curl_init($url);
$verbose = fopen('curl.txt', 'w+');
$options = array(
CURLOPT_REFERER => $referer,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "gzip, deflate, br",
CURLOPT_USERAGENT => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0",
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_VERBOSE => TRUE,
CURLOPT_STDERR => $verbose,
);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
rewind($verbose);
$verboseLog = stream_get_contents($verbose);
echo "Verbose information:\n<pre>", htmlspecialchars($verboseLog), "</pre>\n";
if (curl_errno($ch)) {
// this would be your first hint that something went wrong
die('Couldn\'t send request: ' . curl_error($ch));
} else {
// check the HTTP status code of the request
$resultStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($resultStatus != 200) {
die('Request failed: HTTP status code: ' . $resultStatus);
}
}
curl_close($ch);
return $content;
}
this time the output is what i want
* Trying 2600:140f:5800::17d7:d73a...
* TCP_NODELAY set
* Connected to live12p.hotstar.com (2600:140f:5800::17d7:d73a) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:#STRENGTH
* successfully set certificate verify locations:
* CAfile: /Applications/XAMPP/xamppfiles/share/curl/curl-ca-bundle.crt
CApath: none
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=IN; ST=Maharashtra; L=Mumbai; O=Novi Digital Entertainment Pvt. Ltd.; CN=*.hotstar.com
* start date: Jul 5 00:00:00 2021 GMT
* expire date: Jul 13 23:59:59 2022 GMT
* subjectAltName: host "live12p.hotstar.com" matched cert's "*.hotstar.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
> GET /hls/live/2024725/ipl2021/hin/1540008470/15mindvrm01c6817544da534447ba5b5f3760666fd923september2021/master_7.m3u8 HTTP/1.1
Host: live12p.hotstar.com
Accept: */*
Accept-Encoding: gzip, deflate, br
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0
Cookie: hs_uid=5221b879-8857-496d-80bd-691646c0fcae; ajs_anonymous_id=%22e1cac957-8325-4cc5-a4c7-04c9322a50b5%22; ajs_user_id=%22971450d482ae40088b5d834ff952f60b%22; ajs_group_id=null; hdntl=exp=1632448848~acl=*ipl2021*~id=5267bf12e30107702f21c2ea2bf8b874~data=ip%3dwzSX5TdVuh1sa432PD6kIOuXAlfHJY32Vve29D3csZOD8xO2AjRrZpV-userid%3d8kP6OEf3LRYFUhAWTlF2R7ooxuElWlYpTzzEAosrFQCW-did%3dYfixYiH5EvZpALjGAYAzylXejGbgnbBi0BBLUmB93Jgj3HHJzJjbH16-~hmac=3508bd69101ca28bef6b9bb4ff4fb833c344404e3d62968cb38bcabb1d756a71
Referer: https://www.hotstar.com/
Origin: https://www.hotstar.com
< HTTP/1.1 200 OK
< Akamai-Path-Timestamp: i=1632413538.914;xi=1632413538.921;xo=1632413540.009;s=1632413540.017;
< Content-Encoding: gzip
< Content-Length: 1898
< Last-Modified: Thu, 23 Sep 2021 16:12:18 GMT
< X-Akamai-Live-Origin-QoS: d=4000;t=1632413538.918
< X-Akamai-Server: Akamai-SMT
< Vary: Accept-Encoding
< Akamai-Mon-Iucid-Ing: 2024725
< Expires: Thu, 23 Sep 2021 16:12:21 GMT
< Cache-Control: max-age=0, no-cache, no-store
< Pragma: no-cache
< Date: Thu, 23 Sep 2021 16:12:21 GMT
< Connection: keep-alive
< Content-Type: application/x-mpegURL
< Access-Control-Allow-Origin: https://www.hotstar.com
< Access-Control-Max-Age: 86400
< Access-Control-Allow-Credentials: true
< Access-Control-Expose-Headers: Server,range,hdntl,hdnts,Akamai-Mon-Iucid-Ing,Akamai-Mon-Iucid-Del,X-Reference-Error,X-ErrorType
< Access-Control-Allow-Headers: origin,range,hdntl,hdnts,X-allowRequest
< Access-Control-Allow-Methods: GET,POST,OPTIONS
but this time when i am sending this request online from www.xyz.com then the output is 403 Forbidden Why?
Verbose information:
* Added hotstar.com:2600:140f:5800::17d7:d73a to DNS cache
* Trying 23.37.230.73:443...
* Connected to live12p.hotstar.com (23.37.230.73) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=IN; ST=Maharashtra; L=Mumbai; O=Novi Digital Entertainment Pvt. Ltd.; CN=*.hotstar.com
* start date: Jul 5 00:00:00 2021 GMT
* expire date: Jul 13 23:59:59 2022 GMT
* subjectAltName: host "live12p.hotstar.com" matched cert's "*.hotstar.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
> GET /hls/live/2024725/ipl2021/hin/1540008470/15mindvrm01c6817544da534447ba5b5f3760666fd923september2021/master_7.m3u8 HTTP/1.1
Host: live12p.hotstar.com
Accept: */*
Cookie: hs_uid=5221b879-8857-496d-80bd-691646c0fcae; ajs_anonymous_id=%22e1cac957-8325-4cc5-a4c7-04c9322a50b5%22; ajs_user_id=%22971450d482ae40088b5d834ff952f60b%22; ajs_group_id=null; hdntl=exp=1632451493~acl=*ipl2021*~id=e1acc4c006ec4a10fb2422774e4b9806~data=ip%3dwzSX5TdVuh1sa432PD6kIOuXAlfHJY32Vve29D3csZOD8xO2AjRrZpV-userid%3d8kP6OEf3LRYFUhAWTlF2R7ooxuElWlYpTzzEAosrFQCW-did%3dYfixYiH5EvZpALjGAYAzylXejGbgnbBi0BBLUmB93Jgj3HHJzJjbH16-~hmac=c9248d2c8ea8b9e0420ad176918a1ea25561bab8af14b444594515491d00fb6e
Referer: https://www.hotstar.com/
Origin: https://www.hotstar.com
X-FORWARDED-FOR: 223.190.135.254
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< Server: AkamaiGHost
< Mime-Version: 1.0
< Content-Type: text/html
< Content-Length: 416
< X-Reference-Error: 18.45e62517.1632413497.90e1a77
< Expires: Thu, 23 Sep 2021 16:11:37 GMT
< Cache-Control: max-age=0, no-cache, no-store
< Pragma: no-cache
< Date: Thu, 23 Sep 2021 16:11:37 GMT
< Connection: keep-alive
< Country: IN
< X-ErrorType: geo-blocked
< Access-Control-Allow-Origin: https://www.hotstar.com
< Access-Control-Max-Age: 86400
< Access-Control-Allow-Credentials: true
< Access-Control-Expose-Headers: Server,range,hdntl,hdnts,Akamai-Mon-Iucid-Ing,Akamai-Mon-Iucid-Del,X-Reference-Error,X-ErrorType
< Access-Control-Allow-Headers: origin,range,hdntl,hdnts,X-allowRequest
< Access-Control-Allow-Methods: GET,POST,OPTIONS
<
* Connection #0 to host live12p.hotstar.com left intact
Request failed: HTTP status code: 403
I don't know why this is happening and how can i solve it
if any one have idea please help me to solve this
The service you're trying to access is behind Akamai's network and it appears they're geo restricting access.
Note the response header:
X-ErrorType: geo-blocked

PHP Curl Error 35 - Peer reports incompatible or unsupported protocol version

I'm attempting to make POST requests to an external API(Code below). I manage to get the curl coding working on my localhost but when I go to my staging server the curl returns error Peer reports incompatible or unsupported protocol version(35). From reading into this I need to add an error buffer to get more debugging info but the documentation is confusing.
Yet when I make a curl request directly in the servers terminal with I receive a connection successful. Which makes me extremely confused as the server is clearly capable to make curl requests.
PHP Curl Code
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => json_encode($body),
CURLOPT_HTTPHEADER => $header,
));
$response = curl_exec($curl);
$err = curl_error($curl);
echo "response: " . $response;
echo "<br><br>error: " . $err;
curl_close($curl);
Server Curl Response
curl https://support.zendesk.com/api/v2/users/create_or_update.json
* About to connect() to support.zendesk.com port 443 (#0)
* Trying 104.16.51.111...
* Connected to support.zendesk.com (104.16.51.111) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN= support.zendesk.com,O="CloudFlare, Inc.",L=San Francisco,ST=CA,C=US
* start date: Mar 08 00:00:00 2019 GMT
* expire date: Mar 08 12:00:00 2020 GMT
* common name: support.zendesk.com
* issuer: CN=CloudFlare Inc ECC CA-2,O="CloudFlare, Inc.",L=San Francisco,ST=CA,C=US
> GET /api/v2/users/create_or_update.json HTTP/1.1
> User-Agent: curl/7.29.0
> Host: support.zendesk.com
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< Date: Fri, 12 Apr 2019 12:52:28 GMT
< Content-Type: application/json; charset=UTF-8
< Content-Length: 37
< Connection: keep-alive
< Set-Cookie: __cfduid=da0ecd56691c96b9b3dac091df58383d51555073548; expires=Sat, 11-Apr-20 12:52:28 GMT; path=/; domain=.ralphandrussoclientcare.zendesk.com; HttpOnly
< WWW-Authenticate: Basic realm="Web Password"
< Strict-Transport-Security: max-age=31536000;
< Cache-Control: no-cache
< X-Zendesk-Origin-Server: app23.pod17.euw1.zdsys.com
< X-Request-Id: 4c65566eacc82981-DUB
< X-Runtime: 0.032000
< X-Zendesk-Request-Id: 3360f95a861586e6f414
< Set-Cookie: __cfruid=7af98f1cbac97922c1c15b82f7c133c3945a446e-1555073548; path=/; domain=.ralphandrussoclientcare.zendesk.com; HttpOnly
< Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Server: cloudflare
< CF-RAY: 4c65566eacc82981-DUB
<
* Connection #0 to host support.zendesk.com left intact
{"error":"Couldn't authenticate you"}
In order to solve this problem I performed a server SSL test using https://www.ssllabs.com/ssltest/ which showed me which Protocols were already open and available on the server.
From that I followed the answer to this question TLS 1.2 not working in cURL which showed me which PHP CURLOPT_SSLVERSION number to use in order to access an open protocol.
Therefore I had to add the following line of code to my Curl Array
CURLOPT_SSLVERSION => CURL_SSLVERSION_TLSv1_2

What's the best way to make a webservices in joomla?

Hi guys today I have a interesting question.
What's the best way to make a webservices in joomla?
I'm trying to make a web services in joomla and I have following problem:
in the controller of the view: components/com_webservice/view/view.json.php
<?php
defined('_JEXEC') or die('Restricted access');
jimport('joomla.application.component.view');
class WebServicesViewServices extends JViewLegacy {
private $data;
function __construct($config = array()) {
JLoader::import('models.services', JPATH_COMPONENT);
$model = new WebServicesModelServices();
if ($model->errors) {
echo json_encode($model->errors);
jexit();
}else{
$this->data = array('iphone' => '5s','iphone' => '6','iphone' => '6s','iphone' => '6s plus');
}
parent::__construct($config);
}
function display($tpl = null) {
echo json_encode($this->data);
}
}
?>
The problem is, if I execute: curl http://wsn.jserver/index.php?option=com_services&format=json
to consume this services, this response me
* Connected to wsn.jserver (127.0.0.1) port 80 (#0)
> GET /index.php?option=com_jserver HTTP/1.1
> Host: wsn.jserver
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 303 See other
< Date: Fri, 23 Oct 2015 02:40:37 GMT
< Server: Apache/2.4.16 (Unix) PHP/5.5.30
< X-Powered-By: PHP/5.5.30
< Set-Cookie: 4dbb8abeb5e7919ee73c8545901d5f62=d6ksd6e93t99q7hsk8cf10hq35; path=/; HttpOnly
< Set-Cookie: e909c2d7067ea37437cf97fe11d91bd0=DO
< Location: http://wsn.jserver/index.php?lang=es
< Content-Length: 0
< Content-Type: text/html; charset=utf-8
<
* Connection #0 to host wsn.jserver left intact
How can I do that this work?
what is the best way to make webservices in joomla?
Have you checked the JoomlaTools Framework? From the linked page:
Designed around the HTTP protocol. Each component automatically provides a level 3 JSON REST API out of the box, no extra coding required.
Solved!
I've found the problem.
The problem is because joomla to do a redirect to select language by default.
In my case, I've been duplicate the plugin languagefilter and validated that no be the option="com_services"
then when I execute the command "curl -v http://wsn.jserver/index.php?option=com_webervices" the response is:
* Trying 127.0.0.1...
* Connected to wsn.pawad (127.0.0.1) port 80 (#0)
> GET /index.php?option=com_pawaservices HTTP/1.1
> Host: wsn.jserver
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 23 Oct 2015 18:48:09 GMT
< Server: Apache/2.4.16 (Unix) PHP/5.5.30
< X-Powered-By: PHP/5.5.30
< Set-Cookie: 4dbb8abeb5e7919ee73c8545901d5f62=6a8m8cdte288k3jp2kvefmfe07; path=/; HttpOnly
< Content-Length: 16
< Content-Type: text/html
<
* Connection #0 to host wsn.pawad left intact
{"iphone":"5s", "iphone":"6", "iphone":"6s", "iphone":"6s plus"}
In conclusion, the problem is because joomla to do a redirect. to solved this problem you can hack the plugin languagefilter: plugins/system/languagefilter/languagefilter.php or create a new plugin.

xml download - blocked

I am trying to download xml file from one polish website. For first days it worked but then I could download this file to my server (but I can open and download it on my computer). In file on my server in which there should be xml content is html content telling me that I have been blocked.
I was trying to contact with webmaster from website from which I want to get xml and he told me that I am not blocked by IP address. So the question is what I should sent in headers or what to download this file?
My code to download xml file is below and here is the xml which I want to download: http://www.polskatimes.pl/rss/fakty_kraj.xml
$headers[] = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
$headers[] = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$headers[] = "Accept-Language:pl-PL,pl;q=0.8";
$headers[] = "Accept-Encoding:gzip,deflate,sdch";
$headers[] = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$headers[] = "Keep-Alive:115";
$headers[] = "Connection:keep-alive";
$headers[] = "Cache-Control:max-age=0";
$xml_data = file_get_contents($xml,false,stream_context_create(
array("http" => array('header' => $headers)))); // your file is in the string "$xml" now.
file_put_contents($xml_md5, $xml_data); // now your xml file is saved.
Request the URL in verbose mode (-v):
* About to connect() to www.polskatimes.pl port 80 (#0)
* Trying 195.8.99.38... connected
* Connectede to www.polskatimes.pl (195.8.99.38) port 80 (#0)
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
> Host: www.polskatimes.pl
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Thu, 18 Apr 2013 10:40:15 GMT
< Content-Type: text/html; charset=utf8
< Transfer-Encoding: chunked
< Connection: close
< Vary: Accept-Encoding
< Expires: Thu, 18 Apr 2013 10:40:15 GMT
< Cache-Control: max-age=0
(html page with message that I am temporary blocked)
* Closing connection #0
To inspect what happens behind the scene (and which headers you actually need or not) you need to analyze a little. That is nothing magic, you can do it on the commandline with a software called curl. It is available for many (even all?) computer platforms.
First step most often is to request the URL in verbose mode (-v):
$ curl -v http://www.polskatimes.pl/rss/fakty_kraj.xml
* About to connect() to www.polskatimes.pl port 80 (#0)
* Trying 195.8.99.38... connected
* Connected to www.polskatimes.pl (195.8.99.38) port 80 (#0)
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: www.polskatimes.pl
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 17 Apr 2013 17:39:51 GMT
< Server: Apache
< Set-Cookie: sprawdz_cookie=1; expires=Thu, 17-Apr-2014 17:39:51 GMT
< Location: http://www.polskatimes.pl/rss/fakty_kraj.xml?cookie=1
< Vary: Accept-Encoding
< Content-Length: 0
< Connection: close
< Content-Type: text/html; charset=iso-8859-2
<
* Closing connection #0
That shows you the request (prefixed with >) and response (prefixed with <) headers and the response body (empty in this case). As you can see the status is 302 Found which means as 3xx a redirect and the location header tells where to:
Location: http://www.polskatimes.pl/rss/fakty_kraj.xml?cookie=1
As the query parameter suggests, this is a cookie-check. The cookie itself is set as well:
Set-Cookie: sprawdz_cookie=1; expires=Thu, 17-Apr-2014 17:39:51 GMT
So in the next step we will replay the last command but this time setting the cookie which can be done with the -b argument:
$ curl -v -b prawdz_cookie=1 http://www.polskatimes.pl/rss/fakty_kraj.xml
* About to connect() to www.polskatimes.pl port 80 (#0)
* Trying 195.8.99.38... connected
* Connected to www.polskatimes.pl (195.8.99.38) port 80 (#0)
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: www.polskatimes.pl
> Accept: */*
> Cookie: prawdz_cookie=1
>
< HTTP/1.1 200 OK
< Date: Wed, 17 Apr 2013 17:43:52 GMT
< Server: Apache
< Set-Cookie: sesja_gratka=e38fa0eb93705c8de7ae906198494439; expires=Wed, 24-Apr-2013 17:43:52 GMT; path=/; domain=polskatimes.pl
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Vary: Accept-Encoding
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/xml; charset=utf-8
<
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title><![CDATA[Fakty - Kraj]]></title>
<link>http://www.polskatimes.pl/fakty/kraj/</link>
<atom:link href="http://www.polskatimes.pl/rss/fakty_kraj.xml" rel="self" type="application/rss+xml"/>
<description><![CDATA[Materiały z działu Kraj]]></description>
... (cutted)
So this is immediately successful. And now the real good part: You know that you need to set the cookie for the request and curl shows you already all headers it used:
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: www.polskatimes.pl
> Accept: */*
> Cookie: prawdz_cookie=1
Most of them you do not need to care about with file_get_contents, the first line as well as the Host: and the Accept: line.
The User-Agent: header does not look like it really plays a role as curl is accepted.
So all what is left is the Cookie: header. Let's try in PHP:
$ php -r "echo file_get_contents('http://www.polskatimes.pl/rss/fakty_kraj.xml', null,
stream_context_create(['http'=>['header'=>['Cookie: prawdz_cookie=1']]]));"
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title><![CDATA[Fakty - Kraj]]></title>
<link>http://www.polskatimes.pl/fakty/kraj/</link>
<atom:link href="http://www.polskatimes.pl/rss/fakty_kraj.xml" rel="self"
type="application/rss+xml"/>
... (cutted)
And this is the direct test that only the Set-Cookie: prawdz_cookie=1 header is needed.

PHP: Validating URL with Curl

$fileSource = "http://google.com";
$ch = curl_init($fileSource);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($retcode != 200) {
$error .= "The source specified is not a valid URL.";
}
curl_close($ch);
Here's my issue. When I use the above and set $fileSource = "http://google.com"; it does not work, whereas if I set it to $fileSource = "http://www.google.com/"; it works.
What's the issue?
One permanently redirects (301) to the www. domain, while the other one just replies OK (200).
Why are you only considering only the 200 status code as valid? Let CURL handle that for you:
curl_setopt($ch, CURLOPT_FAILONERROR, true);
From the manual:
TRUE to fail silently if the HTTP code returned is greater than or
equal to 400. The default behavior is to return the page normally,
ignoring the code.
Try explicitly telling curl to follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
If that doesn't work you may need to spoof a user agent on some sites.
Also, if they are using JS redirects your out of luck.
What you're seeing is actually a result of a 301 redirect. Here's what I got back using a verbose curl from the command line
curl -vvvvvv http://google.com
* About to connect() to google.com port 80 (#0)
* Trying 173.194.43.34...
* connected
* Connected to google.com (173.194.43.34) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.25.0 (x86_64-apple-darwin11.3.0) libcurl/7.25.0 OpenSSL/1.0.1b zlib/1.2.6 libidn/1.22
> Host: google.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Fri, 04 May 2012 04:03:59 GMT
< Expires: Sun, 03 Jun 2012 04:03:59 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
* Connection #0 to host google.com left intact
* Closing connection #0
However, if you do a curl on the actual www.google.com suggested in the 301 redirect, you'll get the following.
curl -vvvvvv http://www.google.com
* About to connect() to www.google.com port 80 (#0)
* Trying 74.125.228.19...
* connected
* Connected to www.google.com (74.125.228.19) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.25.0 (x86_64-apple-darwin11.3.0) libcurl/7.25.0 OpenSSL/1.0.1b zlib/1.2.6 libidn/1.22
> Host: www.google.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 04 May 2012 04:05:25 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
I've truncated the remainder of google's response just to show the primary difference in the 200 OK vs 301 REDIRECT

Categories