Downloading videos from youtube with PHP - still possible? - php

I was wondering if there is a way, after all the changes in youtube in the last months, to write a script that enables downloading videos?
I'm aware of the fact that the methods that worked a year ago, are no longer relevant.

Yes there is still a way. I'm not going to write the whole script for you, but I'll at least write you the function that provides the download link from YouTube.
Alright, you're going to need CURL installed to do this.
/* Developed by User - WebEntrepreneur # StackOverlow.com */
function ___get_youtube_video($youtube_url)
{
if(eregi('youtube.com', $youtube_url))
{
preg_match('/http:\/\/(.+)youtube.com\/watch(.+)?v=(.+)/', $youtube_url, $youtube_id_regex);
$youtube_id = ($youtube_id_regex[3]) ? $youtube_id_regex[3] : '';
if(!$youtube_id)
{
return INVALID_YOUTUBE_ID;
}
if(eregi('\&', $youtube_id))
{
$youtube_id_m = explode('&', $youtube_id);
foreach($youtube_id_m as $slices)
{
$youtube_id = $slices;
break;
}
}
} else {
$youtube_id = ($youtube_url);
}
$ping = ___get_curl("http://www.youtube.com/watch?v={$youtube_id}&feature=youtu.be");
if(!$ping)
{
return YOUTUBE_UNAVAILABLE;
}
$ping_scan = nl2br($ping);
$ping_scan = explode('<br />', $ping_scan);
if(eregi('= null', $ping_scan[36]) or !$ping_scan[36])
{
return YOUTUBE_TOO_MANY_REQUESTS;
}
$ping_scan = str_replace("\n", "", $ping_scan[36]);
$ping_scan = str_replace(" img.src = '", "", $ping_scan);
$out[1] = str_replace("';", "", $ping_scan);
$sub_ping = ___get_curl("http://gdata.youtube.com/feeds/api/videos/{$youtube_id}");
preg_match('/<title type=\'text\'>(.+)<\/title>/', $sub_ping, $inout);
$inout = $inout[1];
if(!$out[1])
{
return VERSION_EXPIRED;
}
$out[1] = str_replace("generate_204", "videoplayback", $out[1]);
$out[1] = str_replace("\\/", "/", $out[1]);
$out[1] = rawurldecode($out[1]);
if($inout)
{
$out[1] .= "&file=".urlencode($inout).".mp4";
$filename = urlencode($inout).".mp4";
}
header("Content-Disposition: attachment; filename=\"".$filename."\"");
flush();
return ___get_curl($out[1]);
}
function ___get_curl($url)
{
if(!function_exists("curl_setopt"))
{
return CURL_NOT_INSTALLED;
}
$ch=curl_init();
curl_setopt($ch,CURLOPT_USERAGENT, "YouTube Video Downloader");
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,0);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_TIMEOUT,100000);
$curl_output=curl_exec($ch);
$curlstatus=curl_getinfo($ch);
curl_close($ch);
return $curl_output;
}
Now, let me talk about the code. After around 30 minutes of research, I managed to hack their algorithm however it does have very strict boundaries which they've set in now.
As each request made for a video download is only tied to the IP, you would need to stream it out of your bandwidth which is what the script above does. However, sites like keepvid.com use Java to grab the download url and stream it to the user. I've also put in my own YouTube ID grabber which is very handy for these kind of tools.
Please acknowledge that as stated, YouTube does always change their algorithms and usage of this tool is at your own risk. I am not held liable for any damages made to YouTube.
Hope it sets a good boundary for you, spent a while making it.

Related

How do I store mostly static data from a JSON api?

My php project is using the reddit JSON api to grab the title of the current page's submission.
Right now I am doing running some code every time the page is loaded and I'm running in to some problems, even though there is no real API limit.
I would like to store the title of the submission locally somehow. Can you recommend the best way to do this? The site is running on appfog. What would you recommend?
This is my current code:
<?php
/* settings */
$url="http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$reddit_url = 'http://www.reddit.com/api/info.{format}?url='.$url;
$format = 'json'; //use XML if you'd like...JSON FTW!
$title = '';
/* action */
$content = get_url(str_replace('{format}',$format,$reddit_url)); //again, can be xml or json
if($content) {
if($format == 'json') {
$json = json_decode($content,true);
foreach($json['data']['children'] as $child) { // we want all children for this example
$title= $child['data']['title'];
}
}
}
/* output */
/* utility function: go get it! */
function get_url($url) {
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,1);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
?>
Thanks!
Introduction
Here is a modified version of your code
$url = "http://stackoverflow.com/";
$loader = new Loader();
$loader->parse($url);
printf("<h4>New List : %d</h4>", count($loader));
printf("<ul>");
foreach ( $loader as $content ) {
printf("<li>%s</li>", $content['title']);
}
printf("</ul>");
Output
New List : 7New podcast from Joel Spolsky and Jeff Atwood. Good site for example code/ Pyhtonstackoverflow.com has clearly the best Web code ever conceived in the history of the Internet and reddit should better start copying it.A reddit-like, OpenID using website for programmersGreat developer site. Get your questions answered and by someone who knows.Stack Overflow launched into publicStack Overflow, a programming Q & A site. & Reddit could learn a lot from their interface!
Simple Demo
The Problem
I see some things you want to achieve here namely
I would like to store the title of the submission locally somehow
Right now I am doing running some code every time the page is loaded
From what i understand you need is a simple cache copy of your data so that you don't have to load the url all the time.
Simple Solution
A simple cache system you can use is memcache ..
Example A
$url = "http://stackoverflow.com/";
// Start cache
$m = new Memcache();
$m->addserver("localhost");
$cache = $m->get(sha1($url));
if ($cache) {
// Use cache copy
$loader = $cache;
printf("<h2>Cache List: %d</h2>", count($loader));
} else {
// Start a new Loader
$loader = new Loader();
$loader->parse($url);
printf("<h2>New List : %d</h2>", count($loader));
$m->set(sha1($url), $loader);
}
// Oupput all listing
printf("<ul>");
foreach ( $loader as $content ) {
printf("<li>%s</li>", $content['title']);
}
printf("</ul>");
Example B
You can use Last Modification Date as the cache key as so that you would only save new copy only if the document is modified
$headers = get_headers(sprintf("http://www.reddit.com/api/info.json?url=%s",$url), true);
$time = strtotime($headers['Date']); // get last modification date
$cache = $m->get($time);
if ($cache) {
$loader = $cache;
}
Since your class implements JsonSerializable you can json encode your result and also store in a Database like MongoDB or MySQL
$data = json_encode($loader);
// Save to DB
Class Used
class Loader implements IteratorAggregate, Countable, JsonSerializable {
private $request = "http://www.reddit.com/api/info.json?url=%s";
private $data = array();
private $total;
function parse($url) {
$content = json_decode($this->getContent(sprintf($this->request, $url)), true);
$this->data = array_map(function ($v) {
return $v['data'];
}, $content['data']['children']);
$this->total = count($this->data);
}
public function getIterator() {
return new ArrayIterator($this->data);
}
public function count() {
return $this->total;
}
public function getType() {
return $this->type;
}
public function jsonSerialize() {
return $this->data;
}
function getContent($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
}
I'm not sure what your question is exactly but the first thing that pops is the following:
foreach($json['data']['children'] as $child) { // we want all children for this example
$title= $child['data']['title'];
}
Are you sure you want to overwrite $title? In effect, that will only hold the last $child title.
Now, to your question. I assume you're looking for some kind of mechanism to cache the contents of the requested URL so you don't have to re-issue the request every time, am I right? I don't have any experience with appFog, only with orchestra.io but I believe they have the same restrictions regarding writing to files, as in you can only write to temporary files.
My suggestion would be to cache the (processed) response in either:
APC shared memory with a short TTL
temporary files
database
You could use the hash of the URL + arguments as the lookup key, doing this check inside get_url() would mean you wouldn't need to change any other part of your code and it would only take ~3 LOC.
After this:
if($format == 'json') {
$json = json_decode($content,true);
foreach($json['data']['children'] as $child) { // we want all children for this example
$title = $child['data']['title'];
}
}
}`
Then store in a json file and dump it into your localfolder website path
$storeTitle = array('title'=>$title)
$fp = fopen('../pathToJsonFile/title.json'), 'w');
fwrite($fp, json_encode($storeTitle));
fclose($fp);
Then you can always call the json file next time and decode it and extract the title into a variable for use
i usually just store the data as is as a flat file, like so:
<?php
define('TEMP_DIR', 'temp/');
define('TEMP_AGE', 3600);
function getinfo($url) {
$temp = TEMP_DIR . urlencode($url) . '.json';
if(!file_exists($temp) OR time() - filemtime($temp) > TEMP_AGE) {
$info = "http://www.reddit.com/api/info.json?url=$url";
$json = file_get_contents($info);
file_put_contents($temp, $json);
}
else {
$json = file_get_contents($temp);
}
$json = json_decode($json, true);
$titles = array();
foreach($json['data']['children'] as $child) {
$titles[] = $child['data']['title'];
}
return $titles;
}
$test = getinfo('http://imgur.com/');
print_r($test);
PS.
i use file_get_contents to get the json data, you might have your own reasons to use curl.
also i don't check for format, cos clearly you prefer json.

How can I retrieve YouTube video details from video URL using PHP?

Using PHP, how can I get video information like title, description, thumbnail from a youtube video URL e.g.
http://www.youtube.com/watch?v=B4CRkpBGQzU
You can get data from youtube oembed interface in two formats: XML and JSON
Interface address: http://www.youtube.com/oembed?url=youtubeurl&format=json
Use this PHP function to get data
function get_youtube($url){
$youtube = "http://www.youtube.com/oembed?url=". $url ."&format=json";
$curl = curl_init($youtube);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$return = curl_exec($curl);
curl_close($curl);
return json_decode($return, true);
}
$url = // youtube video url
// Display Data
print_r(get_youtube($url));
Don't forget to enable extension=php_curl.dll in your php.ini
This returns metadata about a video:
http://www.youtube.com/oembed?url={videoUrlHere}&format=json
Using your example, a call to:
http://www.youtube.com/oembed?url=http://www.youtube.com/watch?v=B4CRkpBGQzU&format=json
Returns the following, which you can digest and parse with PHP:
{
"provider_url": "http:\/\/www.youtube.com\/",
"thumbnail_url": "http:\/\/i3.ytimg.com\/vi\/B4CRkpBGQzU\/hqdefault.jpg",
"title": "Joan Osborne - One Of Us",
"html": "\u003ciframe width=\"459\" height=\"344\" src=\"http:\/\/www.youtube.com\/embed\/B4CRkpBGQzU?fs=1\u0026feature=oembed\" frameborder=\"0\" allowfullscreen\u003e\u003c\/iframe\u003e",
"author_name": "jzsdhk",
"height": 344,
"thumbnail_width": 480,
"width": 459,
"version": "1.0",
"author_url": "http:\/\/www.youtube.com\/user\/jzsdhk",
"provider_name": "YouTube",
"type": "video",
"thumbnail_height": 360
}
Yet another URL API that can be helpful is: https://www.youtube.com/get_video_info?video_id=B4CRkpBGQzU
video_id is "v" argument of youTube. Result is a dictionary in URL-encoded format (key1=value1&key2=value2&...)
This is undocumented API existing for long time, so exploring it is up to developer. I am aware of "status" (ok/fail), "errorcode" (100 and 150 in my practice), "reason" (string description of error). I am getting duration ("length_seconds") this way because oEmbed does not provide this information (strange, but true) and I can hardly motivate every employer to get keys from youTube to use official API
To get youtube video description, different sized thumbnail, tags, etc use updated v3 googleapis
Sample URL: https://www.googleapis.com/youtube/v3/videos?part=snippet&id=7wtfhZwyrcc&key=my_google_api_key
$google_api_key = 'google_api_key';
$video_id = 'youtube_video_id';
$api_url = 'https://www.googleapis.com/youtube/v3/videos?part=snippet&id='.$video_id.'&key='.$google_api_key;
$json = file_get_contents($api_url);
$obj = json_decode($json);
var_dump($obj);
I wrote this class to display a Youtube video on our website:
<?php
/*
Copyright (c) 2013 ReFri Software / Internet Publication
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following
conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
* Neither the name of ReFri Software / Internet Publicion nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
youtube.php
Author : Cynthia Fridsma
http://www.heathernova.us
Date : February 4, 2013
Version: 1.1
Property's width and height are now private
Added: variable error message
*/
class YouTube {
public $video;
private $width;
private $height;
public $error = "<b>Video not found, sorry</b>" ;
function __construct ($width=640, $height=360){
$this->width=$width;
$this->height=$height;
if($this->width<220){
// minimum width
$this->width=220;
}
if($this->height<220){
// minimum height
$this->height=220;
}
}
function playVideo(){
// Default video settings:
#
# youtube_list = false
# youtube_video = false
$youtube_list = false;
$youtube_video = false;
// always check contains youtube.com or youtube.be as a reference....
if(stristr($this->video, "youtube.com")==true ||(stristr($this->video, "youtube.be") || (stristr($this->video, "youtu.be")))){
// Test if the video contains a query..
$test = (parse_url($this->video));
if(isset($test['query'])){
$testing = $test['query'];
parse_str($testing);
if(isset($v)&&(isset($list))){
// we're dealing with a play list and a selected video.
$test = $list;
$youtube_list = true;
}
if(isset($list) &&(empty($v))){
// we're only dealing wih a play list.
$test = $list;
$youtube_list = true;
$test = $list;
}
if(isset($v) &&(empty($list))){
// we're only dealing with a single video.
$test = $v;
$youtube_video = true;
}
if(empty($v) &&(empty($list))){
// we're not dealing with a valid request.
$youtube_video = false;
}
} else {
// Apperently we're dealing with a shared link.
$testing =parse_url($this->video, PHP_URL_PATH);
$test = stristr($testing, "/");
$test = substr($test,1);
$youtube_video = true;
}
if($youtube_video==true){
// Display a single video
$play ='<iframe width="'.$this->width.'" height="'.$this->height.'" src="http://www.youtube.com/embed/'.$test.'?rel=0" frameborder="0" allowfullscreen></iframe>';
}
if($youtube_list==true){
// Display a video play list.
$youtube_video = true;
$play = '<iframe width="'.$this->width.'" height="'.$this->height.'" src="http://www.youtube.com/embed/videoseries?list='.$test.'" frameborder="0" allowfullscreen></iframe>';
}
if($youtube_video == false){
// We are unable to determine the video.
$play = $this->error;
}
} else {
// This is not a valid youtube requeust
$play = $this->error;
}
// Return the results
return $this->playVideo=$play;
}
}
And this is how you can use the class:
<?php
include_once("youtube.php");
$video = new YouTube(320,240);
// playlist
$video->video = "http://www.youtube.com/playlist?list=PL6F5E9FD99326F98C";
$youtube = $video->playVideo();
echo $youtube . "<hr>";
// 1 video
$video->video="http://www.youtube.com/watch?v=uTA5bLxfvwo";
$youtube = $video->playVideo();
echo $youtube;
// shared link
echo "<hr>";
$video = new YouTube(420,315);
$video->video="http://youtu.be/VRl2XFJ_tg4";
$youtube = $video->playVideo();
echo $youtube;
// invalid request
$video->video = "http://wrongvideo.com?nothere";
// Your own error message
$video->error ="<br>Video not found :-(<br>" ;
$youtube = $video->playVideo();
echo $youtube;
?>
demo page:
http://www.heathernova.us/youtube.php
TTFN
Cynthia Fridsma
I know this is an old question but I wrote a package for this purpose, I will put it here maybe someone needs this.
composer require smoqadam/youtube-video-info:dev-master
$details = $video->getDetails();
echo $details->getVideoId();
echo $details->getTitle();
echo $details->getThumbnails();
echo $details->getViewCount();
echo $details->getRating();
more info in here
Here is a method that can return several pieces of information:
<?php
extension_loaded('openssl') or die('openssl');
$http['method'] = 'POST';
$http['header'] = 'Content-Type: application/json';
$http['content'] = <<<eof
{
"videoId": "lTGjFpjOIP8", "context": {
"client": {"clientName": "WEB", "clientVersion": "1.19700101"}
}
}
eof;
$con = stream_context_create(['http' => $http]);
$key = '?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8';
echo file_get_contents(
'https://www.youtube.com/youtubei/v1/player' . $key, context: $con
);
From oembed:
$video_url = 'https://www.youtube.com/watch?v=bHQqvYy5KYo';
$url = 'http://www.youtube.com/oembed?format=json&url=' . $video_url;
$json = json_decode(file_get_contents($url), true);
print_r($json); // it will return details of video
function to get video id from video url
function getYTid($ytURL) {
$ytvIDlen = 11; // This is the length of YouTube's video IDs
// The ID string starts after "v=", which is usually right after
// "youtube.com/watch?" in the URL
$idStarts = strpos($ytURL, "?v=");
// In case the "v=" is NOT right after the "?" (not likely, but I like to keep my
// bases covered), it will be after an "&":
if($idStarts === FALSE)
$idStarts = strpos($ytURL, "&v=");
// If still FALSE, URL doesn't have a vid ID
if($idStarts === FALSE)
die("YouTube video ID not found. Please double-check your URL.");
// Offset the start location to match the beginning of the ID string
$idStarts +=3;
// Get the ID string and return it
$ytvID = substr($ytURL, $idStarts, $ytvIDlen);
return $ytvID;
}
and then
$video = getYTid($videourl);
$video_feed = file_get_contents("http://gdata.youtube.com/feeds/api/videos/$video");
$sxml = new SimpleXmlElement($video_feed);
//set up nodes
$namespaces = $sxml->getNameSpaces(true);
$media = $sxml->children($namespaces['media']);
$yt = $media->children($namespaces['yt']);
$yt_attrs = $yt->duration->attributes();
//vars
$video_title = $sxml->title;
$video_description = $sxml->content;
$video_keywords = $media->group->keywords;
$video_length = $yt_attrs['seconds'];

Amazon S3 Signed URL in PHP

This is the function pulled out of an old WP plugin for returning a signed Amazon S3 URL, but I can't get it to work! When I visit the signed URL it returns, I get this:
The request signature we calculated does not match the signature you provided. Check your key and signing method.
function s3Url($text) {
$AWS_S3_KEY = 'KEY';
$AWS_S3_SECRET = 'SECRET';
$tag_pattern = '/(\[S3 bucket\=(.*?)\ text\=(.*?)\](.*?)\[\/S3\])/i';
define("AWS_S3_KEY", $AWS_S3_KEY); // replace this with your AWS S3 key
define("AWS_S3_SECRET", $AWS_S3_SECRET); // replace this with your secret key.
$expires = time()+get_option('expire_seconds');
if (preg_match_all ($tag_pattern, $text, $matches)) {
for ($m=0; $m<count($matches[0]); $m++) {
$bucket = $matches[2][$m];
$link_text = $matches[3][$m];
$resource = $matches[4][$m];
$string_to_sign = "GET\n\n\n$expires\n/".str_replace(".s3.amazonaws.com","",$bucket)."/$resource";
//$string_to_sign = "GET\n\n\n{$expires}\n/{$bucket}/{$resource}";
$signature = urlencode(base64_encode((hash_hmac("sha1", utf8_encode($string_to_sign), AWS_S3_SECRET, TRUE))));
$authentication_params = "AWSAccessKeyId=".AWS_S3_KEY;
$authentication_params.= "&Expires={$expires}";
$authentication_params.= "&Signature={$signature}";
$tag_pattern_match = "/(\[S3 bucket\=(.*?)\ text\={$link_text}\]{$resource}\[\/S3\])/i";
if(strlen($link_text) == 0)
{
$link = "http://{$bucket}/{$resource}?{$authentication_params}";
}
else
{
$link = "<a href='http://{$bucket}/{$resource}?{$authentication_params}'>{$link_text}</a>";
}
$text = preg_replace($tag_pattern_match,$link,$text);
}
}
return $text;
}
The example provided in the Amazon AWS PHP SDK: sdk-latest\sdk-1.3.5\sdk-1.3.5\_samples\cli-s3_get_urls_for_uploads.php the following code works quite well:
/* Execute our queue of batched requests. This may take a few seconds to a
few minutes depending on the size of the files and how fast your upload
speeds are. */
$file_upload_response = $s3->batch()->send();
/* Since a batch of requests will return multiple responses, let's
make sure they ALL came back successfully using `areOK()` (singular
responses use `isOK()`). */
if ($file_upload_response->areOK())
{
// Loop through the individual filenames
foreach ($individual_filenames as $filename)
{
/* Display a URL for each of the files we uploaded. Since uploads default to
private (you can choose to override this setting when uploading), we'll
pre-authenticate the file URL for the next 5 minutes. */
echo $s3->get_object_url($bucket, $filename, '5 minutes') . PHP_EOL . PHP_EOL;
}
}

OpenID Discovery Methods - Yadis VS HTML

Recently, I've begun writing my own PHP OpenID consumer class in order to better understand openID. As a guide, I've been referencing the [LightOpenID Class][1]. For the most part, I understand the code and how OpenID works. My confusion comes when looking at the author's discover function:
function discover($url)
{
if(!$url) throw new ErrorException('No identity supplied.');
# We save the original url in case of Yadis discovery failure.
# It can happen when we'll be lead to an XRDS document
# which does not have any OpenID2 services.
$originalUrl = $url;
# A flag to disable yadis discovery in case of failure in headers.
$yadis = true;
# We'll jump a maximum of 5 times, to avoid endless redirections.
for($i = 0; $i < 5; $i ++) {
if($yadis) {
$headers = explode("\n",$this->request($url, 'HEAD'));
$next = false;
foreach($headers as $header) {
if(preg_match('#X-XRDS-Location\s*:\s*(.*)#', $header, $m)) {
$url = $this->build_url(parse_url($url), parse_url(trim($m[1])));
$next = true;
}
if(preg_match('#Content-Type\s*:\s*application/xrds\+xml#i', $header)) {
# Found an XRDS document, now let's find the server, and optionally delegate.
$content = $this->request($url, 'GET');
# OpenID 2
# We ignore it for MyOpenID, as it breaks sreg if using OpenID 2.0
$ns = preg_quote('http://specs.openid.net/auth/2.0/');
if (preg_match('#<Service.*?>(.*)<Type>\s*'.$ns.'(.*?)\s*</Type>(.*)</Service>#s', $content, $m)
&& !preg_match('/myopenid\.com/i', $this->identity)) {
$content = $m[1] . $m[3];
if($m[2] == 'server') $this->identifier_select = true;
$content = preg_match('#<URI>(.*)</URI>#', $content, $server);
$content = preg_match('#<LocalID>(.*)</LocalID>#', $content, $delegate);
if(empty($server)) {
return false;
}
# Does the server advertise support for either AX or SREG?
$this->ax = preg_match('#<Type>http://openid.net/srv/ax/1.0</Type>#', $content);
$this->sreg = preg_match('#<Type>http://openid.net/sreg/1.0</Type>#', $content);
$server = $server[1];
if(isset($delegate[1])) $this->identity = $delegate[1];
$this->version = 2;
$this->server = $server;
return $server;
}
# OpenID 1.1
$ns = preg_quote('http://openid.net/signon/1.1');
if(preg_match('#<Service.*?>(.*)<Type>\s*'.$ns.'\s*</Type>(.*)</Service>#s', $content, $m)) {
$content = $m[1] . $m[2];
$content = preg_match('#<URI>(.*)</URI>#', $content, $server);
$content = preg_match('#<.*?Delegate>(.*)</.*?Delegate>#', $content, $delegate);
if(empty($server)) {
return false;
}
# AX can be used only with OpenID 2.0, so checking only SREG
$this->sreg = preg_match('#<Type>http://openid.net/sreg/1.0</Type>#', $content);
$server = $server[1];
if(isset($delegate[1])) $this->identity = $delegate[1];
$this->version = 1;
$this->server = $server;
return $server;
}
$next = true;
$yadis = false;
$url = $originalUrl;
$content = null;
break;
}
}
if($next) continue;
# There are no relevant information in headers, so we search the body.
$content = $this->request($url, 'GET');
if($location = $this->htmlTag($content, 'meta', 'http-equiv', 'X-XRDS-Location', 'value')) {
$url = $this->build_url(parse_url($url), parse_url($location));
continue;
}
}
if(!$content) $content = $this->request($url, 'GET');
# At this point, the YADIS Discovery has failed, so we'll switch
# to openid2 HTML discovery, then fallback to openid 1.1 discovery.
$server = $this->htmlTag($content, 'link', 'rel', 'openid2.provider', 'href');
$delegate = $this->htmlTag($content, 'link', 'rel', 'openid2.local_id', 'href');
$this->version = 2;
# Another hack for myopenid.com...
if(preg_match('/myopenid\.com/i', $server)) {
$server = null;
}
if(!$server) {
# The same with openid 1.1
$server = $this->htmlTag($content, 'link', 'rel', 'openid.server', 'href');
$delegate = $this->htmlTag($content, 'link', 'rel', 'openid.delegate', 'href');
$this->version = 1;
}
if($server) {
# We found an OpenID2 OP Endpoint
if($delegate) {
# We have also found an OP-Local ID.
$this->identity = $delegate;
}
$this->server = $server;
return $server;
}
throw new ErrorException('No servers found!');
}
throw new ErrorException('Endless redirection!');
}
[1]: http://gitorious.org/lightopenid
Okay, Here's the logic as I understand it (basically):
Check to see if the $url sends you a valid XRDS file that you then parse to figure out the OpenID provider's endpoint.
From my understanding, this is called the Yadis authentication method.
If no XRDS file is found, Check the body of the response for an HTML <link> tag that contains the url of the endpoint.
What. The. Heck.
I mean seriously? Essentially screen scrape the response and hope you find a link with the appropriate attribute value?
Now, don't get me wrong, this class works like a charm and it's awesome. I'm just failing to grok the two separate methods used to discover the endpoint: XRDS (yadis) and HTML.
My Questions
Are those the only two methods used in the discovery process?
Is one only used in version 1.1 of OpenID and the other in version 2?
Is it critical to support both methods?
The site I've encountered the HTML method on is Yahoo. Are they nuts?
Thanks again for your time folks. I apologize if I sound a little flabbergasted, but I was genuinely stunned at the methodology once I began to understand what measures were being taken to find the endPoint.
Specification is your friend.
But answering your question:
Yes. Those are the only two methods defined by the OpenID specifications (at least, for URLs -- there is a third method for XRIs).
No, both can be used with both version of the protocol. Read the function carefully, and you'll see that it supports both methods for both versions.
If you want your library to work with every provider and user, you'd better do. Some users paste the HTML tags into their sites, so their site's url can be used as an openid.
Some providers even use both methods at once, to mantain compatibility with consumers not implementing YADIS discovery (which isn't part of OpenID 1.1, but can be used with it). So that does make sense.
And yes, HTML discovery is about searching for a <link> in the response body. That's why it's called HTML discovery.

HTTP Digest authenticating in PHP

I want to authenticate to another site using HTTP Digest authorization in PHP script.
My function has as parameter just content of the WWW-Authenticate header and I want to generate correct response (Authorization header). I have found many examples that explain how to implement this the other way (browser authenticate to my script) but not this way. I am missing function that is able to parse WWW-Authenticate header content a generate response. Is there some standard function or common library that implements this?
Ok, no answer yet, I have investigated python implementation that lied around here and rewrite it to PHP. It is the simplest possible piece of code. Supports only md5 hashing, but works for me:
function H($param) {
return md5($param);
}
function KD($a,$b) {
return H("$a:$b");
}
function parseHttpDigest($digest) {
$data = array();
$parts = explode(", ", $digest);
foreach ($parts as $element) {
$bits = explode("=", $element);
$data[$bits[0]] = str_replace('"','', $bits[1]);
}
return $data;
}
function response($wwwauth, $user, $pass, $httpmethod, $uri) {
list($dummy_digest, $value) = split(' ', $wwwauth, 2);
$x = parseHttpDigest($value);
$realm = $x['realm'];
$A1 = $user.":".$realm.":".$pass;
$A2 = $httpmethod.":".$uri;
if ($x['qop'] == 'auth') {
$cnonce = time();
$ncvalue = 1;
$noncebit = $x['nonce'].":".$ncvalue.":".$cnonce.":auth:".H($A2);
$respdig = KD(H($A1), $noncebit);
}else {
# FIX: handle error here
}
$base = 'Digest username="'.$user.'", realm="';
$base .= $x['realm'].'", nonce="'.$x['nonce'].'",';
$base .= ' uri="'.$uri.'", cnonce="'.$cnonce;
$base .= '", nc="'.$ncvalue.'", response="'.$respdig.'", qop="auth"';
return $base;
}
Usage:
# TEST
$www_header = 'Digest realm="TEST", nonce="356f2dbb8ce08174009d53c6f02c401f", algorithm="MD5", qop="auth"';
print response($www_header, "user", "password", "POST", "/my_url_query");
Don't know of a ready-made client-side implementation in PHP; you have to implement the RFC as if your script were the browser, authenticating to a remote server. Wikipedia's page on HTTP Digest has a nice example.
(it's not that hard - a couple of MD5 hashes. Some gotchas I encontered when building the server-side: string delimiter is ":" (colon), request method is also a part of the hash)

Categories