I want a direct link to videos from Vimeo with a PHP script.
I managed to find them manually, but my PHP script does not work.
Here is the initiative:
For example I took this video: http://vimeo.com/22439234
When you go on the page, Vimeo generates a signature associated with the current timestamp and this video. This information is stored in a JavaScript variable, around line 520 just after:
window.addEvent ('domready', function () {
Then when you click Play, the HTML5 player reads this variable and sends an HTTP request:
http:// player.vimeo.com/play_redirect?clip_id=37111719&sig={SIGNATURE}&time={TIMESTAMP}&quality=sd&codecs=H264,VP8,VP6&type=moogaloop_local&embed_location=
But it also works with:
http:// player.vimeo.com/play_redirect?clip_id=37111719&sig={SIGNATURE}&time={TIMESTAMP}&quality=sd
If this URL does not open with the IP address that opened http://vimeo.com/22439234, this returns the HTTP code 200 with an error message.
If this URL is opened with the correct IP address, the header "Location" redirects to link to the video file:
http://av.vimeo.com/XXX/XX/XXXX.mp4?aksessionid=XXXX&token=XXXXX_XXXXXXXXX
When I build this link http://player.vimeo.com/play_redirect?... manually ("right click"> "source code"> "line 520") it works.
But with PHP and regex it returns the HTTP code 200 with an error message.
Why ?
From my observations, Vimeo does not check the headers of the HTTP request for http:// player.vimeo.com/play_redirect?...
GET, HEAD, with cookies, without cookies, referrer etc. ... does not change.
With PHP, I use the function file_get_contents() and get_headers().
<?php
function getVimeo($id) {
$content = file_get_contents('http://vimeo.com/'.$id);
if (preg_match('#document\.getElementById\(\'player_(.+)\n#i', $content, $scriptBlock) == 0)
return 1;
preg_match('#"timestamp":([0-9]+)#i', $scriptBlock[1], $matches);
$timestamp = $matches[1];
preg_match('#"signature":"([a-z0-9]+)"#i', $scriptBlock[1], $matches);
$signature = $matches[1];
$url = 'http://player.vimeo.com/play_redirect?clip_id='.$id.'&sig='.$signature.'&time='.$timestamp.'&quality=sd';
print_r(get_headers($url, 1));
}
The algorithm looks like this:
Input data: vimeoUrl.
content = getRemoteContent(vimeoUrl).
Parse content to find and extract the value from data-config-url
attribute.
Navigate to data-config-url and load the content as JSON Object:
$video = json_decode($this->getRemoteContent($video->getAttribute('data-config-url')));
Return $video->request->files->h264->sd->url — this will return a
direct link for SD quality video.
Here is my simple class, that working for this moment.
class VideoController
{
/**
* #var array Vimeo video quality priority
*/
public $vimeoQualityPrioritet = array('sd', 'hd', 'mobile');
/**
* #var string Vimeo video codec priority
*/
public $vimeoVideoCodec = 'h264';
/**
* Get direct URL to Vimeo video file
*
* #param string $url to video on Vimeo
* #return string file URL
*/
public function getVimeoDirectUrl($url)
{
$result = '';
$videoInfo = $this->getVimeoVideoInfo($url);
if ($videoInfo && $videoObject = $this->getVimeoQualityVideo($videoInfo->request->files))
{
$result = $videoObject->url;
}
return $result;
}
/**
* Get Vimeo video info
*
* #param string $url to video on Vimeo
* #return \stdClass|null result
*/
public function getVimeoVideoInfo($url)
{
$videoInfo = null;
$page = $this->getRemoteContent($url);
$dom = new \DOMDocument("1.0", "utf-8");
libxml_use_internal_errors(true);
$dom->loadHTML('<?xml version="1.0" encoding="UTF-8"?>' . "\n" . $page);
$xPath = new \DOMXpath($dom);
$video = $xPath->query('//div[#data-config-url]');
if ($video)
{
$videoObj = json_decode($this->getRemoteContent($video->item(0)->getAttribute('data-config-url')));
if (!property_exists($videoObj, 'message'))
{
$videoInfo = $videoObj;
}
}
return $videoInfo;
}
/**
* Get vimeo video object
*
* #param stdClass $files object of Vimeo files
* #return stdClass Video file object
*/
public function getVimeoQualityVideo($files)
{
$video = null;
if (!property_exists($files, $this->vimeoVideoCodec) && count($files->codecs))
{
$this->vimeoVideoCodec = array_shift($files->codecs);
}
$codecFiles = $files->{$this->vimeoVideoCodec};
foreach ($this->vimeoQualityPrioritet as $quality)
{
if (property_exists($codecFiles, $quality))
{
$video = $codecFiles->{$quality};
break;
}
}
if (!$video)
{
foreach (get_object_vars($codecFiles) as $file)
{
$video = $file;
break;
}
}
return $video;
}
/**
* Get remote content by URL
*
* #param string $url remote page URL
* #return string result content
*/
public function getRemoteContent($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_USERAGENT, 'spider');
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
}
Using:
$video = new VideoController;
var_dump($video->getVimeoDirectUrl('http://vimeo.com/90747156'));
Try add a valid user-agent to headers for an each request.
For this you must use cURL or HttpRequest instead file_get_contents().
After such manipulations I got a working link for downloading the video file.
Here my code:
function getVimeo($id) {
// get page with a player
$queryResult = httpQuery('http://vimeo.com/' . $id);
$content = $queryResult['content'];
if (preg_match('#document\.getElementById\(\'player_(.+)\n#i', $content, $scriptBlock) == 0)
return 1;
preg_match('#"timestamp":([0-9]+)#i', $scriptBlock[1], $matches);
$timestamp = $matches[1];
preg_match('#"signature":"([a-z0-9]+)"#i', $scriptBlock[1], $matches);
$signature = $matches[1];
$url = 'http://player.vimeo.com/play_redirect?clip_id=' . $id . '&sig=' . $signature . '&time=' . $timestamp . '&quality=sd';
// make the request for getting a video url
#print_r(get_headers($url, 1));
$finalQuery = httpQuery($url);
return $finalQuery['redirect_url'];
}
// make queries via CURL
function httpQuery($url) {
$options = array(
CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/12.04 Chromium/18.0.1025.168 Chrome/18.0.1025.168 Safari/535.19',
CURLOPT_RETURNTRANSFER => true,
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$result = $info;
$result['content'] = $content;
return $result;
}
echo getVimeo(22439234);
Related
I'm using the GoogleMaps API to get the lat and long for an address, but strangely enough Google Maps causes PHP to run out of memory when a # symbol is present in the string...
The code:
//some attempted addresses: "1234 Memory Lane Suite #1", "4321 Test Dr #4", "#"
$this->address = htmlentities($_POST['address']);
$googlemaps = new GoogleMaps(GOOGLE_MAPS_API_KEY);
$coordinates = $googlemaps->getCoordinates(
'UNITED STATES, ' .
$this->state . ', ' .
$this->city . ', ' .
$this->address
);
The error:
Allowed memory size of 268435456 bytes exhausted (tried to allocate 19574456 bytes)
When I strip out the pound symbol, everything works correctly. Just kind of curious what is Google's problem with the hash symbol?
Here's the source code of the API, It's pretty short:
class GoogleMaps {
/**
* The Google Maps API key holder
* #var string
*/
private $mapApiKey;
/**
* Class Constructor
*/
public function __construct() {
}
/**
* Reads an URL to a string
* #param string $url The URL to read from
* #return string The URL content
*/
private function getURL($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$tmp = curl_exec($ch);
curl_close($ch);
if ($tmp != false){
return $tmp;
}
}
/**
* Get Latitude/Longitude/Altitude based on an address
* #param string $address The address for converting into coordinates
* #return array An array containing Latitude/Longitude/Altitude data
*/
public function getCoordinates($address){
$address = str_replace(' ','+',$address);
$url = 'http://maps.google.com/maps/geo?q=' . $address . '&output=xml';
$data = $this->getURL($url);
if ($data){
$xml = new SimpleXMLElement($data);
$requestCode = $xml->Response->Status->code;
if ($requestCode == 200){
//all is ok
$coords = $xml->Response->Placemark->Point->coordinates;
$coords = explode(',',$coords);
if (count($coords) > 1){
if (count($coords) == 3){
return array('lat' => $coords[1], 'long' => $coords[0], 'alt' => $coords[2]);
} else {
return array('lat' => $coords[1], 'long' => $coords[0], 'alt' => 0);
}
}
}
}
//return default data
return array('lat' => 0, 'long' => 0, 'alt' => 0);
}
}; //end class`
Encode the address by using urlencode(), otherwise the output-parameter will be ignored, because google handles the # as an anchor, not as a part of the q-parameter and the response will be json not xml
I want to use the Plesk Api for PHP. I download a sample from the Parallels website and tried to use it for my website. When I open the page on my website I get the following error:
Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML
The code I use:
<?php
error_reporting(E_ALL);
ini_set("display_errors", 1);
/**
* Reports error during API RPC request
*/
class ApiRequestException extends Exception {}
/**
* Returns DOM object representing request for information about all available domains
* #return DOMDocument
*/
function domainsInfoRequest()
{
$xmldoc = new DomDocument('1.0', 'UTF-8');
$xmldoc->formatOutput = true;
// <packet>
$packet = $xmldoc->createElement('packet');
$packet->setAttribute('version', '1.4.1.2');
$xmldoc->appendChild($packet);
// <packet/domain>
$domain = $xmldoc->createElement('domain');
$packet->appendChild($domain);
// <packet/domain/get>
$get = $xmldoc->createElement('get');
$domain->appendChild($get);
// <packet/domain/get/filter>
$filter = $xmldoc->createElement('filter');
$get->appendChild($filter);
// <packet/domain/get/dataset>
$dataset = $xmldoc->createElement('dataset');
$get->appendChild($dataset);
// dataset elements
$dataset->appendChild($xmldoc->createElement('limits'));
$dataset->appendChild($xmldoc->createElement('prefs'));
$dataset->appendChild($xmldoc->createElement('user'));
$dataset->appendChild($xmldoc->createElement('hosting'));
$dataset->appendChild($xmldoc->createElement('stat'));
$dataset->appendChild($xmldoc->createElement('gen_info'));
return $xmldoc;
}
/**
* Prepares CURL to perform Plesk API request
* #return resource
*/
function curlInit($host, $login, $password)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "https://{$host}:8443/enterprise/control/agent.php");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_HTTPHEADER,
array("HTTP_AUTH_LOGIN: {$login}",
"HTTP_AUTH_PASSWD: {$password}",
"HTTP_PRETTY_PRINT: TRUE",
"Content-Type: text/xml")
);
return $curl;
}
/**
* Performs a Plesk API request, returns raw API response text
*
* #return string
* #throws ApiRequestException
*/
function sendRequest($curl, $packet)
{
curl_setopt($curl, CURLOPT_POSTFIELDS, $packet);
$result = curl_exec($curl);
if (curl_errno($curl)) {
$errmsg = curl_error($curl);
$errcode = curl_errno($curl);
curl_close($curl);
throw new ApiRequestException($errmsg, $errcode);
}
curl_close($curl);
return $result;
}
/**
* Looks if API responded with correct data
*
* #return SimpleXMLElement
* #throws ApiRequestException
*/
function parseResponse($response_string)
{
$xml = new SimpleXMLElement($response_string);
if (!is_a($xml, 'SimpleXMLElement'))
throw new ApiRequestException("Can not parse server response: {$response_string}");
return $xml;
}
/**
* Check data in API response
* #return void
* #throws ApiRequestException
*/
function checkResponse(SimpleXMLElement $response)
{
$resultNode = $response->domain->get->result;
// check if request was successful
if ('error' == (string)$resultNode->status)
throw new ApiRequestException("Plesk API returned error: " . (string)$resultNode->result->errtext);
}
//
// int main()
//
$host = '************';
$login = '************';
$password = '************';
$curl = curlInit($host, $login, $password);
try {
$response = sendRequest($curl, domainsInfoRequest()->saveXML());
$responseXml = parseResponse($response);
checkResponse($responseXml);
} catch (ApiRequestException $e) {
echo $e;
die();
}
// Explore the result
foreach ($responseXml->xpath('/packet/domain/get/result') as $resultNode) {
echo "Domain id: " . (string)$resultNode->id . " ";
echo (string)$resultNode->data->gen_info->name . " (" . (string)$resultNode->data->gen_info->dns_ip_address . ")\n";
}
?>
I hope someone can help me to find a solution.
Your script works perfectly with my server (as much as I was able to restore the formatting).
It seems based on the example provided in original Parallels documentation. So I have grabbed an example from `Plesk integration guide and applied to my server - it works as well.
I could assume some misconfiguration of your Plesk server. Perhaps you could troubleshoot it if your print XML request (domainsInfoRequest()->saveXML()) and XML response ($response). For some reason apparently your $response contains something different from a valid XML code. If not sure, you can copy/paste it into a file and run xmllint (XML validation tool) on it.
I am trying to create a php gotomeating api implementation. I successfully got the access_token but for any other requests I get error responses. This is my code:
<?php
session_start();
$key = '#';
$secret = '#';
$domain = $_SERVER['HTTP_HOST'];
$base = "/oauth/index.php";
$base_url = urlencode("http://$domain$base");
$OAuth_url = "https://api.citrixonline.com/oauth/authorize?client_id=$key&redirect_uri=$base_url";
$OAuth_exchange_keys_url = "http://api.citrixonline.com/oauth/access_token?grant_type=authorization_code&code={responseKey}&client_id=$key";
if($_SESSION['access_token']) CreateForm();else
if($_GET['send']) OAuth_Authentication($OAuth_url);
elseif($_GET['code']) OAuth_Exchanging_Response_Key($_GET['code'],$OAuth_exchange_keys_url);
function OAuth_Authentication ($url){
$_SESSION['access_token'] = false;
header("Location: $url");
}
function CreateForm(){
$data = getURL('https://api.citrixonline.com/G2M/rest/meetings?oauth_token='.$_SESSION['access_token'],false);
}
function OAuth_Exchanging_Response_Key($code,$url){
if($_SESSION['access_token']){
CreateForm();
return true;
}
$data = getURL(str_replace('{responseKey}',$code,$url));
if(IsJsonString($data)){
$data = json_decode($data);
$_SESSION['access_token'] = $data->access_token;
CreateForm();
}else{
echo 'error';
}
}
/*
* Helper functions
*/
/*
* checks if a string is json
*/
function IsJsonString($str){
try{
$jObject = json_decode($str);
}catch(Exception $e){
return false;
}
return (is_object($jObject)) ? true : false;
}
/*
* CURL function to get url
*/
function getURL($url,$auth_token = false,$data=false){
// Initialize session and set URL.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
// Set so curl_exec returns the result instead of outputting it.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
if($auth_token){
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Authorization: OAuth oauth_token='.$auth_token));
}
if($data){
curl_setopt($ch, CURLOPT_POST,true);
$d = json_encode('{ "subject":"test", "starttime":"2011-12-01T09:00:00Z", "endtime":"2011-12-01T10:00:00Z", "passwordrequired":false, "conferencecallinfo":"test", "timezonekey":"", "meetingtype":"Scheduled" }');
echo implode('&', array_map('urlify',array_keys($data),$data));
echo ';';
curl_setopt($ch, CURLOPT_POSTFIELDS,
implode('&', array_map('urlify',array_keys($data),$data))
);
}
// Get the response and close the channel.
$response = curl_exec($ch);
/*
* if redirect, redirect
*/
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($code == 301 || $code == 302) {
preg_match('/<a href="(.*?)">/', $response, $matches);
$newurl = str_replace('&','&',trim(array_pop($matches)));
$response = getURL($newurl);
} else {
$code = 0;
}
curl_close($ch);
return $response;
}
function urlify($key, $val) {
return urlencode($key).'='.urlencode($val);
}
to start the connect process you need to make a request to the php file fith send=1. I tryed diffrent atempts to get the list of meetings but could not get a good response.
Did anybody had prev problems with this or know of a solution for this?
Edit:
This is not a curl error, the server responds with error messages, in the forums from citrix they say it should work, no further details on why it dosen't work, if I have a problem with the way I implemented the oauth or the request code. The most comon error I get is: "error code:31305" that is not documented on the forum.
[I also posted this on the Citrix Developer Forums, but for completeness will mention it here as well.]
We are still finalizing the documentation for these interfaces and some parameters which are written as optional are actually required.
Compared to your example above, changes needed are:
set timezonekey to 67 (Pacific time)
set passwordrequired to false
set conferencecallinfo to Hybrid (meaning: both PSTN and VOIP will be provided)
Taking those changes into account, your sample data would look more like the following:
{"subject":"test meeting", "starttime":"2012-02-01T08:00:00",
"endtime":"2012-02-01T09:00:00", "timezonekey":"67",
"meetingtype":"Scheduled", "passwordrequired":"false",
"conferencecallinfo":"Hybrid"}
You can also check out a working PHP sample app I created: http://pastebin.com/zE77qzAz
Any idea how one would update a user's Twitter status with an image - using the Twitter-Async class?
This is what I have
$twitter = new Twitter(CONSUMER_KEY, CONSUMER_SECRET,$_SESSION['oauth_token'],$_SESSION['oauth_token_secret']);
$array = array('media[]' => '#/img/1.jpg','status' => $status);
$twitter->post('/statuses/update_with_media.json', $array);
With thanks to #billythekid, I have managed to do this. This is what you need to do:
Look these functions up in the EpiOAuth file and see what I've added and alter it where necessary.
EpiOAuth.php
//I have this on line 24
protected $mediaUrl = 'https://upload.twitter.com';
//and altered getApiUrl() to include check for such (you may wish to make this a regex in keeping with the rest?)
private function getApiUrl($endpoint)
{
if(strpos($endpoint,"with_media") > 0)
return "{$this->mediaUrl}/{$this->apiVersion}{$endpoint}";
elseif(preg_match('#^/(trends|search)[./]?(?=(json|daily|current|weekly))#', $endpoint))
return "{$this->searchUrl}{$endpoint}";
elseif(!empty($this->apiVersion))
return "{$this->apiVersionedUrl}/{$this->apiVersion}{$endpoint}";
else
return "{$this->apiUrl}{$endpoint}";
}
// add urldecode if post is multiPart (otherwise tweet is encoded)
protected function httpPost($url, $params = null, $isMultipart)
{
$this->addDefaultHeaders($url, $params['oauth']);
$ch = $this->curlInit($url);
curl_setopt($ch, CURLOPT_POST, 1);
// php's curl extension automatically sets the content type
// based on whether the params are in string or array form
if ($isMultipart) {
$params['request']['status'] = urldecode($params['request']['status']);
}
if($isMultipart)
curl_setopt($ch, CURLOPT_POSTFIELDS, $params['request']);
else
curl_setopt($ch, CURLOPT_POSTFIELDS, $this->buildHttpQueryRaw($params['request']));
$resp = $this->executeCurl($ch);
$this->emptyHeaders();
return $resp;
}
Post image
// how to post image
$twitter = new Twitter(CONSUMER_KEY, CONSUMER_SECRET,$_SESSION['oauth_token'],$_SESSION['oauth_token_secret']);
$array = array('#media[]' => '#/img/1.jpg','status' => $status);
$twitter->post('/statuses/update_with_media.json', $array);
What I'd like to do is find out what is the last/final URL after following the redirections.
I would prefer not to use cURL. I would like to stick with pure PHP (stream wrappers).
Right now I have a URL (let's say http://domain.test), and I use get_headers() to get specific headers from that page. get_headers will also return multiple Location: headers (see Edit below). Is there a way to use those headers to build the final URL? or is there a PHP function that would automatically do this?
Edit: get_headers() follows redirections and returns all the headers for each response/redirections, so I have all the Location: headers.
function getRedirectUrl ($url) {
stream_context_set_default(array(
'http' => array(
'method' => 'HEAD'
)
));
$headers = get_headers($url, 1);
if ($headers !== false && isset($headers['Location'])) {
return $headers['Location'];
}
return false;
}
Additionally...
As was mentioned in a comment, the final item in $headers['Location'] will be your final URL after all redirects. It's important to note, though, that it won't always be an array. Sometimes it's just a run-of-the-mill, non-array variable. In this case, trying to access the last array element will most likely return a single character. Not ideal.
If you are only interested in the final URL, after all the redirects, I would suggest changing
return $headers['Location'];
to
return is_array($headers['Location']) ? array_pop($headers['Location']) : $headers['Location'];
... which is just if short-hand for
if(is_array($headers['Location'])){
return array_pop($headers['Location']);
}else{
return $headers['Location'];
}
This fix will take care of either case (array, non-array), and remove the need to weed-out the final URL after calling the function.
In the case where there are no redirects, the function will return false. Similarly, the function will also return false for invalid URLs (invalid for any reason). Therefor, it is important to check the URL for validity before running this function, or else incorporate the redirect check somewhere into your validation.
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* #param string $url
* #return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = #parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* #param string $url
* #return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* #param string $url
* #return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}
And, as always, give credit:
http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
While the OP wanted to avoid cURL, it's best to use it when it's available. Here's a solution which has the following advantages
uses curl for all the heavy lifting, so works with https
copes with servers which return lower cased location header name (both xaav and webjay's answers do not handle this)
allows you to control how deep you want you go before giving up
Here's the function:
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
$url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close ($ch);
return $url;
}
Here's a more verbose version which allows you to inspect the redirection chain rather than let curl follow it.
function findUltimateDestination($url, $maxRequests = 10)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//customize user agent if you desire...
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
while ($maxRequests--) {
//fetch
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
//try to determine redirection url
$location = '';
if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) {
if (preg_match('/Location:(.*)/i', $response, $match)) {
$location = trim($match[1]);
}
}
if (empty($location)) {
//we've reached the end of the chain...
return $url;
}
//build next url
if ($location[0] == '/') {
$u = parse_url($url);
$url = $u['scheme'] . '://' . $u['host'];
if (isset($u['port'])) {
$url .= ':' . $u['port'];
}
$url .= $location;
} else {
$url = $location;
}
}
return null;
}
As an example of redirection chain which this function handles, but the others do not, try this:
echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005')
At the time of writing, this involves 4 requests, with a mixture of Location and location headers involved.
xaav answer is very good; except for the following two issues:
It does not support HTTPS protocol => The solution was proposed as a comment in the original site: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Some sites will not work since they will not recognise the underlying user agent (client browser)
=> This is simply fixed by adding a User-agent header field: I added an Android user agent (you can find here http://www.useragentstring.com/pages/useragentstring.php other user agent examples according you your need):
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
Here's the modified answer:
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* #param string $url
* #return string
*/
function get_redirect_url($url){
$redirect_url = null;
$url_parts = #parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';
$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);
if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* #param string $url
* #return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* #param string $url
* #return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}
Added to code from answers #xaav and #Houssem BDIOUI: 404 Error case and case when URL with no response. get_final_url($url) in that cases return strings: 'Error: 404 Not Found' and 'Error: No Responce'.
/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect,
* or 'Error: No Responce',
* or 'Error: 404 Not Found'
*
* #param string $url
* #return string
*/
function get_redirect_url($url)
{
$redirect_url = null;
$url_parts = #parse_url($url);
if (!$url_parts)
return false;
if (!isset($url_parts['host']))
return false; //can't process relative URLs
if (!isset($url_parts['path']))
$url_parts['path'] = '/';
$sock = #fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return 'Error: No Responce';
$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?' . $url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while (!feof($sock))
$response .= fread($sock, 8192);
fclose($sock);
if (stripos($response, '404 Not Found') !== false)
{
return 'Error: 404 Not Found';
}
if (preg_match('/^Location: (.+?)$/m', $response, $matches))
{
if (substr($matches[1], 0, 1) == "/")
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);
} else
{
return false;
}
}
/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* #param string $url
* #return array
*/
function get_all_redirects($url)
{
$redirects = array();
while ($newurl = get_redirect_url($url))
{
if (in_array($newurl, $redirects))
{
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}
/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect,
* or 'Error: No Responce'
* or 'Error: 404 Not Found',
*
* #param string $url
* #return string
*/
function get_final_url($url)
{
$redirects = get_all_redirects($url);
if (count($redirects) > 0)
{
return array_pop($redirects);
} else
{
return $url;
}
}
After hours of reading Stackoverflow and trying out all custom functions written by people as well as trying all the cURL suggestions and nothing did more than 1 redirection, I managed to do a logic of my own which works.
$url = 'facebook.com';
// First let's find out if we just typed the domain name alone or we prepended with a protocol
if (preg_match('/(http|https):\/\/[a-z0-9]+[a-z0-9_\/]*/',$url)) {
$url = $url;
} else {
$url = 'http://' . $url;
echo '<p>No protocol given, defaulting to http://';
}
// Let's print out the initial URL
echo '<p>Initial URL: ' . $url . '</p>';
// Prepare the HEAD method when we send the request
stream_context_set_default(array('http' => array('method' => 'HEAD')));
// Probe for headers
$headers = get_headers($url, 1);
// If there is a Location header, trigger logic
if (isset($headers['Location'])) {
// If there is more than 1 redirect, Location will be array
if (is_array($headers['Location'])) {
// If that's the case, we are interested in the last element of the array (thus the last Location)
echo '<p>Redirected URL: ' . $headers['Location'][array_key_last($headers['Location'])] . '</p>';
$url = $headers['Location'][array_key_last($headers['Location'])];
} else {
// If it's not an array, it means there is only 1 redirect
//var_dump($headers['Location']);
echo '<p>Redirected URL: ' . $headers['Location'] . '</p>';
$url = $headers['Location'];
}
} else {
echo '<p>URL: ' . $url . '</p>';
}
// You can now send get_headers to the latest location
$headers = get_headers($url, 1);