How can I get the destination URL using cURL?

How can I get the destination URL using cURL? - php

How can I get the destination URL using cURL when the HTTP status code is 302?
<?PHP
$url = "http://www.ecs.soton.ac.uk/news/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
$status_code = curl_getinfo($ch,CURLINFO_HTTP_CODE);
if($status_code=302 or $status_code=301){
$url = "";
// I want to to get the destination url
}
curl_close($ch);
?>

You can use:
echo curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE); // We'll parse redirect url from header.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE); // We want to just get redirect url but not to follow it.
$response = curl_exec($ch);
preg_match_all('/^Location:(.*)$/mi', $response, $matches);
curl_close($ch);
echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';

A bit dated of a response but wanted to show a full working example, some of the solutions out there are pieces:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //set url
curl_setopt($ch, CURLOPT_HEADER, true); //get header
curl_setopt($ch, CURLOPT_NOBODY, true); //do not include response body
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //do not show in browser the response
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //follow any redirects
curl_exec($ch);
$new_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); //extract the url from the header response
curl_close($ch);
This works with any redirects such as 301 or 302, however on 404's it will just return the original url requested (since it wasn't found). This can be used to update or remove links from your site. This was my need anyway.

You have to grab the Location header for the redirected URL.

In response to user437797's comment on Tamik Soziev's answer (I unfortunately do not have the reputation to comment there directly) :
The CURLINFO_EFFECTIVE_URL works fine, but for it to do as op wants you also have to set CURLOPT_FOLLOWLOCATION to TRUE of course. This is because CURLINFO_EFFECTIVE_URL returns exactly what it says, the effective url that ends up getting loaded. If you don't follow redirects then this will be your requested url, if you do follow redirects then it will be the final url that is redirected to.
The nice thing about this approach is that it also works with multiple redirects, whereas when retrieving and parsing the HTTP header yourself you may have to do that multiple times before the final destination url is exposed.
Also note that the max number of redirects that curl follows can be controlled via CURLOPT_MAXREDIRS. By default it is unlimited (-1) but this may get you into trouble if someone (perhaps intentionally) configured and endless redirect loop for some url.

The new destination for a 302 redirect ist located in the http header field "location".
Example:
HTTP/1.1 302 Found
Date: Tue, 30 Jun 2002 1:20:30 GMT
Server: Apache
Location: http://www.foobar.com/foo/bar
Content-Type: text/html; charset=iso-8859-1
Just grep it with a regex.
To include all HTTP header information include it to the result with the curl option CURLOPT_HEADER. Set it with:
curl_setopt($c, CURLOPT_HEADER, true);
If you simply want curl to follow the redirection use CURLOPT_FOLLOWLOCATION:
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
Anyway, you shouldn't use the new URI because HTTP Statuscode 302 is only a temporary redirect.

Here's a way to get all headers returned by a curl http request, as well as the status code and an array of header lines for each header.
$url = 'http://google.com';
$opts = array(CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true);
$ch = curl_init();
curl_setopt_array($ch, $opts);
$return = curl_exec($ch);
curl_close($ch);
$headers = http_response_headers($return);
foreach ($headers as $header) {
$str = http_response_code($header);
$hdr_arr = http_response_header_lines($header);
if (isset($hdr_arr['Location'])) {
$str .= ' - Location: ' . $hdr_arr['Location'];
}
echo $str . '<br />';
}
function http_response_headers($ret_str)
{
$hdrs = array();
$arr = explode("\r\n\r\n", $ret_str);
foreach ($arr as $each) {
if (substr($each, 0, 4) == 'HTTP') {
$hdrs[] = $each;
}
}
return $hdrs;
}
function http_response_header_lines($hdr_str)
{
$lines = explode("\n", $hdr_str);
$hdr_arr['status_line'] = trim(array_shift($lines));
foreach ($lines as $line) {
list($key, $val) = explode(':', $line, 2);
$hdr_arr[trim($key)] = trim($val);
}
return $hdr_arr;
}
function http_response_code($str)
{
return substr(trim(strstr($str, ' ')), 0, 3);
}

Use curl_getinfo($ch), and the first element (url) would indicate the effective URL.

Related

Shopify REST API Pagination Link Empty

Situation
I am trying to make a call to the Shopify REST API where I have more than 50-250 results but I am not able to get the Link Header from the cURL Response which contains the Pagination Links.
Sample of Link Headers from the API Documentation for Cursor-Pagination (https://shopify.dev/tutorials/make-paginated-requests-to-rest-admin-api)
#...
Link: "<https://{shop}.myshopify.com/admin/api/{version}/products.json?page_info={page_info}&limit={limit}>; rel={next}, <https://{shop}.myshopify.com/admin/api/{version}/products.json?page_info={page_info}&limit={limit}>; rel={previous}"
#...
The link rel parameter does show up, but the Link is empty as below.
My Shopify Call function
function shopify_call($token, $shop, $api_endpoint, $query = array(), $method = 'GET', $request_headers = array()) {
// Build URL
$url = "https://" . $shop . ".myshopify.com" . $api_endpoint;
if (!is_null($query) && in_array($method, array('GET', 'DELETE'))) $url = $url . "?" . http_build_query($query);
$headers = [];
// Configure cURL
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HEADER, TRUE);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
// this function is called by curl for each header received
curl_setopt($curl, CURLOPT_HEADERFUNCTION,
function($ch, $header) use (&$headers)
{
$len = strlen($header);
$header = explode(':', $header, 2);
if (count($header) < 2) // ignore invalid headers
return $len;
$headers[trim($header[0])] = trim($header[1]);
return $len;
}
);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_MAXREDIRS, 3);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
// curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 3);
// curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_USERAGENT, 'Sphyx App v.1');
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, $method);
curl_setopt($curl,CURLOPT_ENCODING,'');
// Setup headers
$request_headers[] = "";
if (!is_null($token)) $request_headers[] = "X-Shopify-Access-Token: " . $token;
$request_headers[] = 'Accept: */*'; // Copied from POSTMAN
$request_headers[] = 'Accept-Encoding: gzip, deflate, br'; // Copied from POSTMAN
curl_setopt($curl, CURLOPT_HTTPHEADER, $request_headers);
if ($method !== 'GET' && in_array($method, array('POST', 'PUT'))) {
if (is_array($query)) $query = http_build_query($query);
curl_setopt ($curl, CURLOPT_POSTFIELDS, $query);
}
// Send request to Shopify and capture any errors
$result = curl_exec($curl);
$response = preg_split("/\r\n\r\n|\n\n|\r\r/", $result, 2);
$error_number = curl_errno($curl);
$error_message = curl_error($curl);
// Close cURL to be nice
curl_close($curl);
// Return an error is cURL has a problem
if ($error_number) {
return $error_message;
} else {
// Return headers and Shopify's response
return array('headers' => $headers, 'response' => json_decode($response[1],true));
}
}
But when I use a POSTMAN Collection, I get a proper formatted response without the Link getting truncated/processed.
I have tried a lot of things here available via the StackOverflow Forums as well as Shopify Community, but I'm unable to parse the Response Header the same way as shown by API Examples or POSTMAN
My issue does seem to be with the PHP Code, but I'm not a pro with cURL. Thus, I'm not able to make it further :(
Also, I'm not able to understand why POSTMAN's Headers are in Proper Case whereas mine are in Lower Case
Thanks in Advance!

Found my answer :
https://community.shopify.com/c/Shopify-APIs-SDKs/Help-with-cursor-based-paging/m-p/579640#M38946
I was using a browser to view my log files. So the data is there but it's hidden because of your use of '<'s around the data. I had to use the browser inspector to see the data. Not sure who decided this syntax was a good idea. Preference would be two headers that one can see and more easily parse since using link syntax is not relative to using an API.
My suggestion would be 2 headers:
X-Shopify-Page-Next: page_info_value (empty if no more pages)
X-Shopify-Page-Perv: page_info_value (empty on first page or if there is no previous page).
Easy to parse and use.
But having this buried as an invalid xml tag, having them both in the same header and using 'rel=' syntax makes no sense at all from an API perspective.

How to convert Google short URLs back to its original URL?

I have some short URLs. How can I get the original URLs from them?.

As the URL-shortener-services are mostly simple redirectors they use the location header to tell the browser where to go to.
You can use PHP's own get_headers() function to get the appropriate header:
$headers = get_headers('http://shorten.ed/fbsfS' , true);
echo $headers['Location'];

Try this
<?php
$url="http://goo.gl/fbsfS";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$a = curl_exec($ch); // $a will contain all headers
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); // This is what you need, it will return you the last effective URL
echo $url; // Redirected url
?>

You can use the curl functions for this:
// The short url to expand
$url = 'http://goo.gl/fbsfS';
// Prepare a request for the given URL
$curl = curl_init($url);
// Set the needed options:
curl_setopt_array($curl, array(
CURLOPT_NOBODY => TRUE, // Don't ask for a body, we only need the headers
CURLOPT_FOLLOWLOCATION => FALSE, // Don't follow the 'Location:' header, if any
));
// Send the request (you should check the returned value for errors)
curl_exec($curl);
// Get information about the 'Location:' header (if any)
$location = curl_getinfo($curl, CURLINFO_REDIRECT_URL);
// This should print:
// http://translate.google.com.ar/translate?hl=es&sl=en&u=http://goo.gl/lw9sU
echo($location);

For all services there is an API, which you can use.

Get Final URL From Double Shortened URL (t.co -> bit.ly -> final)

I couldn't convert a double shortened URL to expanded URL successfully using the below function I got from here:
function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
preg_match('/Location: (.*)\n/', $response, $a);
if (!isset($a[1])) return $url;
return $a[1];
}
I got into trouble when the expanded URL I got was again a shortened URL, which has its expanded URL.
How do I get final expanded URL after it has run through both URL shortening services?

Since t.co uses HTML redirection through the use of JavaScript and/or a <meta> redirect we need to grab it's contents first. Then extract the bit.ly URL from it to perform a HTTP header request to get the final location. This method does not rely on cURL to be enabled on server and uses all native PHP5 functions:
Tested and working!
function large_url($url)
{
$data = file_get_contents($url); // t.co uses HTML redirection
$url = strtok(strstr($data, 'http://bit.ly/'), '"'); // grab bit.ly URL
stream_context_set_default(array('http' => array('method' => 'HEAD')));
$headers = get_headers($url, 1); // get HTTP headers
return (isset($headers['Location'])) // check if Location header set
? $headers['Location'] // return Location header value
: $url; // return bit.ly URL instead
}
// DEMO
$url = 'http://t.co/dd4b3kOz';
echo large_url($url);

Finally found a way to get the final url of a double shortened url. The best way is to use longurl api for it.
I am not sure if it is the correct way, but i am at last getting the output as the final url needed :)
Here's what i did:
<?php
function TextAfterTag($input, $tag)
{
$result = '';
$tagPos = strpos($input, $tag);
if (!($tagPos === false))
{
$length = strlen($input);
$substrLength = $length - $tagPos + 1;
$result = substr($input, $tagPos + 1, $substrLength);
}
return trim($result);
}
function expandUrlLongApi($url)
{
$format = 'json';
$api_query = "http://api.longurl.org/v2/expand?" .
"url={$url}&response-code=1&format={$format}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $api_query );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HEADER, false);
$fileContents = curl_exec($ch);
curl_close($ch);
$s1=str_replace("{"," ","$fileContents");
$s2=str_replace("}"," ","$s1");
$s2=trim($s2);
$s3=array();
$s3=explode(",",$s2);
$s4=TextAfterTag($s3[0],(':'));
$s4=stripslashes($s4);
return $s4;
}
echo expandUrlLongApi('http://t.co/dd4b3kOz');
?>
The output i get is:
"http://changeordie.therepublik.net/?p=371#proliferation"
The above code works.
The code that #cryptic shared is also correct ,but i could not get the result on my server (maybe because of some configuration issue).
If anyone thinks that it could be done by some other way, please feel free to share it.

Perhaps you should just use CURLOPT_FOLLOWLOCATION = true and then determine the final URL you were directed to.

In case the problem is not a Javascript redirect as in t.co or a <META http-equiv="refresh"..., this is reslolving stackexchange URLs like https://stackoverflow.com/q/62317 fine:
public function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
$cleanresponse= preg_replace('/[^A-Za-z0-9\- _,.:\n\/]/', '', $response);
preg_match('/Location: (.*)[\n\r]/', $cleanresponse, $a);
if (!isset($a[1])) return $url;
return parse_url($url, PHP_URL_SCHEME).'://'.parse_url($url, PHP_URL_HOST).$a[1];
}
It cleans the response of any special characters, that can occur in the curl output before cuttoing out the result URL (I ran into this problem on a php7.3 server)

Get the url, a given url redirects to

I mine data from rss links and get a bunch of urls like:
http://feedproxy.google.com/~r/electricpig/~3/qoF8XbocUbE/
.... and if I access the links in my web browser, I am redirected to something like:
http://www.electricpig.co.uk/stuff.
Is there a way in php to write a function that, when given a url "a" that redirects the user to an url "b", returns you the url "b" ?

Here you go:
function getRedirect($oldUrl) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $oldUrl);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
$newUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
return $newUrl;
}
The function requires cURL, and makes use of CURLINO_EFFECTIVE_URL. You can look it up on phpdoc here
EDIT:
if you are certain the oldUrl is not redirecting to newUrl via javascript, then you can also avoid fetching the body of the newUrl using
curl_setopt($ch, CURLOPT_NOBODY, TRUE); // remove body
Put the above line before $res = curl_exec($ch); in the function getRedirect to achiever faster execution.

public function getRedirect($url) {
$headers = get_headers($url, 1);
if (array_key_exists("Location", $headers)) {
$url = getRedirect($headers["Location"]);
}
return $url;
}

PHP cURL: Get target of redirect, without following it

The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn't include the bit of information I want at the moment, which is the target URL if the request returns an HTTP redirection code.
I'm not using CURLOPT_FOLLOWLOCATION because I want to handle specific redirect codes as special cases.
If cURL can follow redirects, why can't it tell me what they redirect to when it isn't following them?
Of course, I could set the CURLOPT_HEADER flag and pick out the Location header. But is there a more efficient way?

This can be done in 4 steps:
Step 1. Initialise curl
curl_init($ch); //initialise the curl handle
//COOKIESESSION is optional, use if you want to keep cookies in memory
curl_setopt($this->ch, CURLOPT_COOKIESESSION, true);
Step 2. Get the headers for $url
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_HEADER, true); //include headers in http data
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //don't follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
$headers = substr($http_data, 0, $curl_info['header_size']); //split out header
Step 3. Check if you have the correct response code
if (!($curl_info['http_code']>299 && $curl_info['http_code']<309)) {
//return, echo, die, whatever you like
return 'Error - http code'.$curl_info['http_code'].' received.';
}
Step 4. Parse the headers to get the new URL
preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!", $headers, $matches);
$url = $matches[1];
Once you have the new URL you can then repeat steps 2-4 as often as you like.

You can simply use it: (CURLINFO_REDIRECT_URL)
$info = curl_getinfo($ch, CURLINFO_REDIRECT_URL);
echo $info; // the redirect URL without following it
as you mentioned, disable the CURLOPT_FOLLOWLOCATION option (before executing) and place my code after executing.
CURLINFO_REDIRECT_URL - With the CURLOPT_FOLLOWLOCATION option
disabled: redirect URL found in the last transaction, that should be
requested manually next. With the CURLOPT_FOLLOWLOCATION option
enabled: this is empty. The redirect URL in this case is available in
CURLINFO_EFFECTIVE_URL
Refrence

curl doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:
From the response:
Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).
If the response has a format similar to:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
<hr>
<address>Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80</address>
</body></html>
You can extract the redirect URL using DOMXPath:
$i = 0;
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301) {
$xml = new DOMDocument();
$xml->loadHTML($result);
$xpath = new DOMXPath($xml);
$href = $xpath->query("//*[#href]")->item(0);
$results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
}
$i++;
}
}
Using CURLOPT_NOBODY
There is a faster way however, as #gAMBOOKa points out; Using CURLOPT_NOBODY. This approach just sends a HEAD request instead of GET (not downloading the actual content, so it should be faster and more efficient) and stores the response header.
Using a regex the target URL can be extracted from the header:
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_NOBODY,true);
curl_setopt($c, CURLOPT_HEADER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301 || $status === 302) {
preg_match("#https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?#",$result,$m);
$results[$i]['target'] = $m[0];
}
$i++;
}
}

No there is no more efficient way
Your can use CURLOPT_WRITEHEADER + VariableStream
So.. you could write headers to variable and parse it

I had the same problem and curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); was of any help.
So, I decided not to use CURL but file_get_contents instead:
$data = file_get_contents($url);
$data = str_replace("<meta http-equiv=\"Refresh\" content=\"0;","<meta",$data);
The last line helped me to block the redirection although the product is not a clean html code.
I parsed the data and could retrieve the redirection URL I wanted to get.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How can I get the destination URL using cURL? - php

You can use: echo curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

You have to grab the Location header for the redirected URL.

Use curl_getinfo($ch), and the first element (url) would indicate the effective URL.

Related

Shopify REST API Pagination Link Empty

How to convert Google short URLs back to its original URL?

Get Final URL From Double Shortened URL (t.co -> bit.ly -> final)

Get the url, a given url redirects to

PHP cURL: Get target of redirect, without following it

Categories

Resources