How can I get HTTP headers (Location) from some URL?

How can I get HTTP headers (Location) from some URL? - php

I have some address (for example: http://example.com/b-out/3456/3212/).This address i must pass through curl. I know that this URL redirects to another URL (like http://sdss.co/go/36a7fe71189fec14c85636f33501f6d2/?...). And this another URL located in the headers (Location) of first URL. How can I get second URL in some variable?

Perform a request to the first URL, confirm a redirect takes place and read the Location header. From PHP cURL retrieving response headers AND body in a single request? and Check headers in PHP cURL server response:
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url);
curl_setopt($curlHandle, CURLOPT_HEADER, 1);
curl_setopt($curlHandle, CURLOPT_NOBODY, 1);
curl_setopt($curlHandle, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
$redirectResponse = curl_exec($curlHandle);
The options being set there mean: return the response headers, don't return the response body, don't automatically follow redirects and return the result in the exec-call.
Now you've got the HTTP response headers, without body, in $redirectResponse. You'll now need to verify that it's a redirect:
$statusCode = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE);
if ($statusCode == 301 || $statusCode == 302 || $statusCode == 303)
{
$headerLength = curl_getinfo($curlHandle, CURLINFO_HEADER_SIZE);
$responseHeaders = substr($redirectResponse, 0, $headerLength);
$redirectUrl = getLocationHeader($responseHeaders);
}
Then create a function to do that:
function getLocationHeader($responseHeaders)
{
}
In there you'll want to explode() the $responseHeaders on HTTP newline (\r\n) and find the header starting with location.
Alternatively, you can use a more abstract HTTP client library like Zend_Http_Client, where it is a little easier to obtain the headers.

I did it like CodeCaster said. This is my function 'getLocationHeader':
function getLocationHeader($responseHeaders)
{
if (preg_match('/Location:(.+)Vary/is', $redirectResponse, $loc))
{
$location = trim($loc[1]);
return $location;
}
return FALSE;
}

Related

Check if File Exists on Amazon s3 Signed URL

I have create a signed $URL for Amazon s3 and it opens perfectly in the browser.
http://testbucket.com.s3.amazonaws.com/100-game-play-intro-1.m4v?AWSAccessKeyId=AKIAJUAjhkhkjhMO73BF5Q&Expires=1378465934&Signature=ttmsAUDgJjCXepwEXvl8JdFu%2F60%3D
**Bucket name and accesskey changed in this example
I am however trying to then use the function below to check (using curl) that the file exists. It fails the CURL connection. If I replace $URL above with the url of an image outside of s3 then this code works perfectly.
I know the file exists in amazon but can't work out why this code fails if using a signed url as above
Any ideas?
Thanks
Here is my code.
function remoteFileExists($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
//don't fetch the actual file, only get header to check if file exists
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, true);
$result = curl_exec($ch);
curl_close($ch);
if ($result !== false) {
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
} else {
$ret = false;
}
} else {
$ret='connection failed';
}
return $ret;
}

When using CURLOPT_NOBODY, libcurl sends an HTTP HEAD request, not a GET request.
...the string to be signed is formed by appending the REST verb, content-md5 value, content-type value, expires parameter value, canonicalized x-amz headers (see recipe below), and the resource; all separated by newlines.
— http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html
The "REST verb" -- e.g., GET vs HEAD -- must be consistent between the signature you generate, and the request that make, so a signature that is valid for GET will not be valid for HEAD and vice versa.
You will need to sign a HEAD request instead of a GET request in order to validate a file in this way.

You can check by the header part.
$full_url = 'https://www.example.com/image.jpg';
$file_headers = #get_headers($full_url);
if($file_headers && strpos($file_headers[0], '200 OK')){
// enter code here
}
Or If you are using AWS S3 then you can also use this one.
if(!class_exists('S3')){
require('../includes/s3/S3.php');
}
S3::setAuth(awsAccessKey, awsSecretKey);
$info = S3::getObjectInfo($bucketName, $s3_furl);
// check for $info value and apply your condition.

Get the url, a given url redirects to

I mine data from rss links and get a bunch of urls like:
http://feedproxy.google.com/~r/electricpig/~3/qoF8XbocUbE/
.... and if I access the links in my web browser, I am redirected to something like:
http://www.electricpig.co.uk/stuff.
Is there a way in php to write a function that, when given a url "a" that redirects the user to an url "b", returns you the url "b" ?

Here you go:
function getRedirect($oldUrl) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $oldUrl);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
$newUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
return $newUrl;
}
The function requires cURL, and makes use of CURLINO_EFFECTIVE_URL. You can look it up on phpdoc here
EDIT:
if you are certain the oldUrl is not redirecting to newUrl via javascript, then you can also avoid fetching the body of the newUrl using
curl_setopt($ch, CURLOPT_NOBODY, TRUE); // remove body
Put the above line before $res = curl_exec($ch); in the function getRedirect to achiever faster execution.

public function getRedirect($url) {
$headers = get_headers($url, 1);
if (array_key_exists("Location", $headers)) {
$url = getRedirect($headers["Location"]);
}
return $url;
}

accept remote domain get/post request

I am working on a linux server that doesn't accept requests(get/post) remote domains. Like, if I use a form on another domain and post that to a script on this server, it isn't processing it. I like to know what options I will have to enable to get this done so that it accepts remote requests please? is it something in the php.ini?
Regards

If the webserver blocks the posts via referrer, you would need to find a way to send a referrer from your web site. Sending the post to a script first and from there to your site would give you the possibility to fake the referrer request header.
The following code of an example php proxy is borrowed from here: http://snipplr.com/view/16058/php-url-proxy/
<?php
// PHP Proxy
// Responds to both HTTP GET and POST requests
//
// Author: Abdul Qabiz
// March 31st, 2006
//
// Get the url of to be proxied
// Is it a POST or a GET?
$url = ($_POST['url']) ? $_POST['url'] : $_GET['url'];
$headers = ($_POST['headers']) ? $_POST['headers'] : $_GET['headers'];
$mimeType =($_POST['mimeType']) ? $_POST['mimeType'] : $_GET['mimeType'];
//Start the Curl session
$session = curl_init($url);
// If it's a POST, put the POST data in the body
if ($_POST['url']) {
$postvars = '';
while ($element = current($_POST)) {
$postvars .= key($_POST).'='.$element.'&';
next($_POST);
}
curl_setopt ($session, CURLOPT_POST, true);
curl_setopt ($session, CURLOPT_POSTFIELDS, $postvars);
}
// Don't return HTTP headers. Do return the contents of the call
curl_setopt($session, CURLOPT_HEADER, ($headers == "true") ? true : false);
curl_setopt($session, CURLOPT_FOLLOWLOCATION, true);
//curl_setopt($ch, CURLOPT_TIMEOUT, 4);
curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
// Make the call
$response = curl_exec($session);
// NOTE: HERE YOU WILL OVERRIDE THE REFERRER REQUEST HEADER
if ($mimeType != "")
{
// The web service returns XML. Set the Content-Type appropriately
header("Content-Type: ".$mimeType);
}
echo $response;
curl_close($session);
?>

PHP cURL: Get target of redirect, without following it

The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn't include the bit of information I want at the moment, which is the target URL if the request returns an HTTP redirection code.
I'm not using CURLOPT_FOLLOWLOCATION because I want to handle specific redirect codes as special cases.
If cURL can follow redirects, why can't it tell me what they redirect to when it isn't following them?
Of course, I could set the CURLOPT_HEADER flag and pick out the Location header. But is there a more efficient way?

This can be done in 4 steps:
Step 1. Initialise curl
curl_init($ch); //initialise the curl handle
//COOKIESESSION is optional, use if you want to keep cookies in memory
curl_setopt($this->ch, CURLOPT_COOKIESESSION, true);
Step 2. Get the headers for $url
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_HEADER, true); //include headers in http data
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //don't follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
$headers = substr($http_data, 0, $curl_info['header_size']); //split out header
Step 3. Check if you have the correct response code
if (!($curl_info['http_code']>299 && $curl_info['http_code']<309)) {
//return, echo, die, whatever you like
return 'Error - http code'.$curl_info['http_code'].' received.';
}
Step 4. Parse the headers to get the new URL
preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!", $headers, $matches);
$url = $matches[1];
Once you have the new URL you can then repeat steps 2-4 as often as you like.

You can simply use it: (CURLINFO_REDIRECT_URL)
$info = curl_getinfo($ch, CURLINFO_REDIRECT_URL);
echo $info; // the redirect URL without following it
as you mentioned, disable the CURLOPT_FOLLOWLOCATION option (before executing) and place my code after executing.
CURLINFO_REDIRECT_URL - With the CURLOPT_FOLLOWLOCATION option
disabled: redirect URL found in the last transaction, that should be
requested manually next. With the CURLOPT_FOLLOWLOCATION option
enabled: this is empty. The redirect URL in this case is available in
CURLINFO_EFFECTIVE_URL
Refrence

curl doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:
From the response:
Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).
If the response has a format similar to:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
<hr>
<address>Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80</address>
</body></html>
You can extract the redirect URL using DOMXPath:
$i = 0;
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301) {
$xml = new DOMDocument();
$xml->loadHTML($result);
$xpath = new DOMXPath($xml);
$href = $xpath->query("//*[#href]")->item(0);
$results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
}
$i++;
}
}
Using CURLOPT_NOBODY
There is a faster way however, as #gAMBOOKa points out; Using CURLOPT_NOBODY. This approach just sends a HEAD request instead of GET (not downloading the actual content, so it should be faster and more efficient) and stores the response header.
Using a regex the target URL can be extracted from the header:
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_NOBODY,true);
curl_setopt($c, CURLOPT_HEADER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301 || $status === 302) {
preg_match("#https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?#",$result,$m);
$results[$i]['target'] = $m[0];
}
$i++;
}
}

No there is no more efficient way
Your can use CURLOPT_WRITEHEADER + VariableStream
So.. you could write headers to variable and parse it

I had the same problem and curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); was of any help.
So, I decided not to use CURL but file_get_contents instead:
$data = file_get_contents($url);
$data = str_replace("<meta http-equiv=\"Refresh\" content=\"0;","<meta",$data);
The last line helped me to block the redirection although the product is not a clean html code.
I parsed the data and could retrieve the redirection URL I wanted to get.

How can I get the destination URL using cURL?

How can I get the destination URL using cURL when the HTTP status code is 302?
<?PHP
$url = "http://www.ecs.soton.ac.uk/news/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
$status_code = curl_getinfo($ch,CURLINFO_HTTP_CODE);
if($status_code=302 or $status_code=301){
$url = "";
// I want to to get the destination url
}
curl_close($ch);
?>

You can use:
echo curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE); // We'll parse redirect url from header.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE); // We want to just get redirect url but not to follow it.
$response = curl_exec($ch);
preg_match_all('/^Location:(.*)$/mi', $response, $matches);
curl_close($ch);
echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';

A bit dated of a response but wanted to show a full working example, some of the solutions out there are pieces:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //set url
curl_setopt($ch, CURLOPT_HEADER, true); //get header
curl_setopt($ch, CURLOPT_NOBODY, true); //do not include response body
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //do not show in browser the response
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //follow any redirects
curl_exec($ch);
$new_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); //extract the url from the header response
curl_close($ch);
This works with any redirects such as 301 or 302, however on 404's it will just return the original url requested (since it wasn't found). This can be used to update or remove links from your site. This was my need anyway.

You have to grab the Location header for the redirected URL.

In response to user437797's comment on Tamik Soziev's answer (I unfortunately do not have the reputation to comment there directly) :
The CURLINFO_EFFECTIVE_URL works fine, but for it to do as op wants you also have to set CURLOPT_FOLLOWLOCATION to TRUE of course. This is because CURLINFO_EFFECTIVE_URL returns exactly what it says, the effective url that ends up getting loaded. If you don't follow redirects then this will be your requested url, if you do follow redirects then it will be the final url that is redirected to.
The nice thing about this approach is that it also works with multiple redirects, whereas when retrieving and parsing the HTTP header yourself you may have to do that multiple times before the final destination url is exposed.
Also note that the max number of redirects that curl follows can be controlled via CURLOPT_MAXREDIRS. By default it is unlimited (-1) but this may get you into trouble if someone (perhaps intentionally) configured and endless redirect loop for some url.

The new destination for a 302 redirect ist located in the http header field "location".
Example:
HTTP/1.1 302 Found
Date: Tue, 30 Jun 2002 1:20:30 GMT
Server: Apache
Location: http://www.foobar.com/foo/bar
Content-Type: text/html; charset=iso-8859-1
Just grep it with a regex.
To include all HTTP header information include it to the result with the curl option CURLOPT_HEADER. Set it with:
curl_setopt($c, CURLOPT_HEADER, true);
If you simply want curl to follow the redirection use CURLOPT_FOLLOWLOCATION:
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
Anyway, you shouldn't use the new URI because HTTP Statuscode 302 is only a temporary redirect.

Here's a way to get all headers returned by a curl http request, as well as the status code and an array of header lines for each header.
$url = 'http://google.com';
$opts = array(CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true);
$ch = curl_init();
curl_setopt_array($ch, $opts);
$return = curl_exec($ch);
curl_close($ch);
$headers = http_response_headers($return);
foreach ($headers as $header) {
$str = http_response_code($header);
$hdr_arr = http_response_header_lines($header);
if (isset($hdr_arr['Location'])) {
$str .= ' - Location: ' . $hdr_arr['Location'];
}
echo $str . '<br />';
}
function http_response_headers($ret_str)
{
$hdrs = array();
$arr = explode("\r\n\r\n", $ret_str);
foreach ($arr as $each) {
if (substr($each, 0, 4) == 'HTTP') {
$hdrs[] = $each;
}
}
return $hdrs;
}
function http_response_header_lines($hdr_str)
{
$lines = explode("\n", $hdr_str);
$hdr_arr['status_line'] = trim(array_shift($lines));
foreach ($lines as $line) {
list($key, $val) = explode(':', $line, 2);
$hdr_arr[trim($key)] = trim($val);
}
return $hdr_arr;
}
function http_response_code($str)
{
return substr(trim(strstr($str, ' ')), 0, 3);
}

Use curl_getinfo($ch), and the first element (url) would indicate the effective URL.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How can I get HTTP headers (Location) from some URL? - php

I did it like CodeCaster said. This is my function 'getLocationHeader': function getLocationHeader($responseHeaders) { if (preg_match('/Location:(.+)Vary/is', $redirectResponse, $loc)) { $location = trim($loc[1]); return $location; } return FALSE; }

Related

Check if File Exists on Amazon s3 Signed URL

Get the url, a given url redirects to

accept remote domain get/post request

PHP cURL: Get target of redirect, without following it

How can I get the destination URL using cURL?

Categories

Resources