I am coding a functionality that allows users to enter a Youtube video URL. I would like to extract the video ID from these urls.
Does Youtube API support some kind of function where I pass the link and it gives the video ID in return. Or do I have to parse the string myself?
I am using PHP ... I would appreciate any pointers / code samples in this regard.
Thanks
Here is an example function that uses a regular expression to extract the youtube ID from a URL:
/**
* get youtube video ID from URL
*
* #param string $url
* #return string Youtube video id or FALSE if none found.
*/
function youtube_id_from_url($url) {
$pattern =
'%^# Match any youtube URL
(?:https?://)? # Optional scheme. Either http or https
(?:www\.)? # Optional www subdomain
(?: # Group host alternatives
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com
(?: # Group path alternatives
/embed/ # Either /embed/
| /v/ # or /v/
| /watch\?v= # or /watch\?v=
) # End path alternatives.
) # End host alternatives.
([\w-]{10,12}) # Allow 10-12 for 11 char youtube id.
$%x'
;
$result = preg_match($pattern, $url, $matches);
if ($result) {
return $matches[1];
}
return false;
}
echo youtube_id_from_url('http://youtu.be/NLqAF9hrVbY'); # NLqAF9hrVbY
It's an adoption of the answer from a similar question.
It's not directly the API you're looking for but probably helpful. Youtube has an oembed service:
$url = 'http://youtu.be/NLqAF9hrVbY';
var_dump(json_decode(file_get_contents(sprintf('http://www.youtube.com/oembed?url=%s&format=json', urlencode($url)))));
Which provides some more meta-information about the URL:
object(stdClass)#1 (13) {
["provider_url"]=>
string(23) "http://www.youtube.com/"
["title"]=>
string(63) "Hang Gliding: 3 Flights in 8 Days at Northside Point of the Mtn"
["html"]=>
string(411) "<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/NLqAF9hrVbY?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/NLqAF9hrVbY?version=3" type="application/x-shockwave-flash" width="425" height="344" allowscriptaccess="always" allowfullscreen="true"></embed></object>"
["author_name"]=>
string(11) "widgewunner"
["height"]=>
int(344)
["thumbnail_width"]=>
int(480)
["width"]=>
int(425)
["version"]=>
string(3) "1.0"
["author_url"]=>
string(39) "http://www.youtube.com/user/widgewunner"
["provider_name"]=>
string(7) "YouTube"
["thumbnail_url"]=>
string(48) "http://i3.ytimg.com/vi/NLqAF9hrVbY/hqdefault.jpg"
["type"]=>
string(5) "video"
["thumbnail_height"]=>
int(360)
}
But the ID is not a direct part of the response. However it might contain the information you're looking for and it might be useful to validate the youtube URL.
I am making slight changes in the above regular expression, although it is working fine for youtube short URL (which have been used in the above example) and simple video URL where no other parameter is coming after video code, but it does not work for URLs like
http://www.youtube.com/watch?v=B_izAKQ0WqQ&feature=related as video code is not the last parameter in this URL.
In the same way v={video_code} does not always come after watch (whereas above regular expression is assuming that it will always come after watch?), like if user has selected language OR location from the footer, for example if user has selected English (UK) from Language option then URL will be http://www.youtube.com/watch?feature=related&hl=en-GB&v=B_izAKQ0WqQ
So I have made some modification in the above regular expressions, but definitely credit goes to hakre for providing the base regular expression, thanks #hakre:
function youtube_id_from_url($url) {
$pattern =
'%^# Match any youtube URL
(?:https?://)? # Optional scheme. Either http or https
(?:www\.)? # Optional www subdomain
(?: # Group host alternatives
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com
(?: # Group path alternatives
/embed/ # Either /embed/
| /v/ # or /v/
| .*v= # or /watch\?v=
) # End path alternatives.
) # End host alternatives.
([\w-]{10,12}) # Allow 10-12 for 11 char youtube id.
($|&).* # if additional parameters are also in query string after video id.
$%x'
;
$result = preg_match($pattern, $url, $matches);
if (false !== $result) {
return $matches[1];
}
return false;
}
You can use the PHP function parse_url to extract host name, path, query string and the fragment. You can then use PHP string functions to locate the video id.
function getYouTubeVideoId($url)
{
$video_id = false;
$url = parse_url($url);
if (strcasecmp($url['host'], 'youtu.be') === 0)
{
#### (dontcare)://youtu.be/<video id>
$video_id = substr($url['path'], 1);
}
elseif (strcasecmp($url['host'], 'www.youtube.com') === 0)
{
if (isset($url['query']))
{
parse_str($url['query'], $url['query']);
if (isset($url['query']['v']))
{
#### (dontcare)://www.youtube.com/(dontcare)?v=<video id>
$video_id = $url['query']['v'];
}
}
if ($video_id == false)
{
$url['path'] = explode('/', substr($url['path'], 1));
if (in_array($url['path'][0], array('e', 'embed', 'v')))
{
#### (dontcare)://www.youtube.com/(whitelist)/<video id>
$video_id = $url['path'][1];
}
}
}
return $video_id;
}
$urls = array(
'http://youtu.be/dQw4w9WgXcQ',
'http://www.youtube.com/?v=dQw4w9WgXcQ',
'http://www.youtube.com/?v=dQw4w9WgXcQ&feature=player_embedded',
'http://www.youtube.com/watch?v=dQw4w9WgXcQ',
'http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=player_embedded',
'http://www.youtube.com/v/dQw4w9WgXcQ',
'http://www.youtube.com/e/dQw4w9WgXcQ',
'http://www.youtube.com/embed/dQw4w9WgXcQ'
);
foreach ($urls as $url)
{
echo sprintf('%s -> %s' . PHP_EOL, $url, getYouTubeVideoId($url));
}
Simple as return substr(strstr($url, 'v='), 2, 11);
I know this is a very late answer but I found this thread while searching for the topic so I want to suggest a more elegant way of doing this using oEmbed:
echo get_embed('youtube', 'https://www.youtube.com/watch?v=IdxKPCv0bSs');
function get_embed($provider, $url, $max_width = '', $max_height = ''){
$providers = array(
'youtube' => 'http://www.youtube.com/oembed'
/* you can add support for more providers here */
);
if(!isset($providers[$provider])){
return 'Invalid provider!';
}
$movie_data_json = #file_get_contents(
$providers[$provider] . '?url=' . urlencode($url) .
"&maxwidth={$max_width}&maxheight={$max_height}&format=json"
);
if(!$movie_data_json){
$error = error_get_last();
/* remove the PHP stuff from the error and show only the HTTP error message */
$error_message = preg_replace('/.*: (.*)/', '$1', $error['message']);
return $error_message;
}else{
$movie_data = json_decode($movie_data_json, true);
return $movie_data['html'];
}
}
oEmbed makes it possible to embed content from more sites by just adding their oEmbed API endpoint to the $providers array in the above code.
Here is a simple solution that has worked for me.
VideoId is the longest word in any YouTube URL types and it comprises (alphanumeric + "-") with minimum length of 8 surrounded by non-word chars. So you can search for below regex in the URL as a group and that first group is your answer. First group because some youtube parameters such as enablejsapi are more than 8 chars but they always come after videoId.
Regex: "\W([\w-]{9,})(\W|$)"
Here is the working java code:
String[] youtubeUrls = {
"https://www.youtube.com/watch?v=UzRtrjyDwx0",
"https://youtu.be/6butf1tEVKs?t=22s",
"https://youtu.be/R46-XgqXkzE?t=2m52s",
"http://youtu.be/dQw4w9WgXcQ",
"http://www.youtube.com/?v=dQw4w9WgXcQ",
"http://www.youtube.com/?v=dQw4w9WgXcQ&feature=player_embedded",
"http://www.youtube.com/watch?v=dQw4w9WgXcQ",
"http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=player_embedded",
"http://www.youtube.com/v/dQw4w9WgXcQ",
"http://www.youtube.com/e/dQw4w9WgXcQ",
"http://www.youtube.com/embed/dQw4w9WgXcQ"
};
String pattern = "\\W([\\w-]{9,})(\\W|$)";
Pattern pattern2 = Pattern.compile(pattern);
for (int i=0; i<youtubeUrls.length; i++){
Matcher matcher2 = pattern2.matcher(youtubeUrls[i]);
if (matcher2.find()){
System.out.println(matcher2.group(1));
}
else System.out.println("Not found");
}
As mentioned in a comment below the valid answer, we use it like this, and it works mighty fine!
function youtube_id_from_url($url) {
$url = trim(strtok("$url", '?'));
$url = str_replace("#!/", "", "$url");
$pattern =
'%^# Match any youtube URL
(?:https?://)? # Optional scheme. Either http or https
(?:www\.)? # Optional www subdomain
(?: # Group host alternatives
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com
(?: # Group path alternatives
/embed/ # Either /embed/
| /v/ # or /v/
| /watch\?v= # or /watch\?v=
) # End path alternatives.
) # End host alternatives.
([\w-]{10,12}) # Allow 10-12 for 11 char youtube id.
$%x'
;
$result = preg_match($pattern, $url, $matches);
if ($result) {
return $matches[1];
}
return false;
}
How about this one:
function getVideoId() {
$query = parse_url($this->url, PHP_URL_QUERY);
$arr = explode('=', $query);
$index = array_search('v', $arr);
if ($index !== false) {
if (isset($arr[$index++])) {
$string = $arr[$index++];
if (($amp = strpos($string, '&')) !== false) {
return substr($string, 0, $amp);
} else {
return $string;
}
} else {
return false;
}
}
return false;
}
No regex, support multiple query parameters, i.e, https://www.youtube.com/watch?v=PEQxWg92Ux4&index=9&list=RDMMom0RGEnWIEk also works.
For JAVA developers
Got this working for me, also supports no-cookie url's:
private static final Pattern youtubeId = Pattern.compile("^(?:https?\\:\\/\\/)?.*(?:youtu.be\\/|vi?\\/|vi?=|u\\/\\w\\/|embed\\/|(watch)?vi?=)([^#&?]*).*$");
#VisibleForTesting
String getVideoId(final String url) {
final Matcher matcher = youtubeId.matcher(url);
if(matcher.find()){
return matcher.group(2);
}
return "";
}
Some test to check youtube url's
#ParameterizedTest
#MethodSource("youtubeTestUrls")
void videoIdFromUrlTest(final String url, final String videoId) {
final String matchedVidID = this.youtubeService.getVideoId(url);
assertEquals(videoId, matchedVidID);
}
private static Stream<Arguments> youtubeTestUrls() {
return Stream.of(
Arguments.of("www.youtube-nocookie.com/embed/dQw4-9W_XcQ?rel=0", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/user/Scobleizer#p/u/1/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=channel", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&playnext_from=TL&videos=osPknwzXEas&feature=sub", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/ytscreeningroom?v=dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("http://youtu.be/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=youtu.be", "dQw4-9W_XcQ"),
Arguments.of("http://youtu.be/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("https://www.youtube.com/user/Scobleizer#p/u/1/dQw4-9W_XcQ?rel=0", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&playnext_from=TL&videos=dQw4-9W_XcQ&feature=sub", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/ytscreeningroom?v=dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/embed/dQw4-9W_XcQ?rel=0", "dQw4-9W_XcQ"),
Arguments.of("https://www.youtube.com/watch?v=dQw4-9W_XcQ", "dQw4-9W_XcQ"),
Arguments.of("http://youtube.com/v/dQw4-9W_XcQ?feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("http://youtube.com/vi/dQw4-9W_XcQ?feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("http://youtube.com/?v=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("http://youtube.com/?vi=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("https://youtube.com/watch?v=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("http://youtube.com/watch?vi=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("http://youtu.be/dQw4-9W_XcQ?feature=youtube_gdata_player", "dQw4-9W_XcQ"),
Arguments.of("https://www.youtube.com/watch?v=yYw2Q141thM&list=PLOwEeBApnYoUFioRitjwz-DREzFGOSgiE&index=2", "yYw2Q141thM"),
Arguments.of("https://www.youtube.com/watch?", "")
);
}
Related
I have this php code to turn youtube urls into videos automatically:
$search = '%
(?:https?://)?
(?:www\.)?
(?:
youtu\.be/
| youtube\.com
(?:
/embed/
| /v/
| /watch\?v=
| /watch\?feature=player_embedded&v=
)
)
([\w\-]{10,12})
\b
%x';
$replace = "<iframe class=\"youtube-player\" width=\"550\" height=\"385\" src=\"http://www.youtube.com/embed/$1\" data-youtube-id=\"$1\" frameborder=\"0\" allowfullscreen></iframe>";
return preg_replace($search, $replace, $url);
What would be the easiest way to strip out anything after the video id?
Wow. The suggested link actually links to another regex. Use parse_url and parse_str, that's what they're there for, see this answer. Parsing URLs with regex is hard and there's no reason to reinvent the wheel.
I found a way thanks to others links, here's a function to search a body of text and replace all youtube links with videos:
function youtube($body)
{
$video_pattern = '~(?:http|https|)(?::\/\/|)(?:www.|)(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[a-z0-9;:##?&%=+\/\$_.-]*~i';
preg_match_all($video_pattern, $body, $matches);
//print_r($matches[0]);
foreach ($matches[0] as $url)
{
if (strpos($url, 'feature=youtu.be') == TRUE || strpos($url, 'youtu.be') == FALSE )
{
parse_str(parse_url($url, PHP_URL_QUERY), $id);
$id = $id['v'];
}
else
{
$id = basename($url);
}
$body = str_replace($url, "<iframe class=\"youtube-player\" width=\"550\" height=\"385\" src=\"http://www.youtube.com/embed/{$id}\" data-youtube-id=\"{$id}\" frameborder=\"0\" allowfullscreen></iframe>", $body);
}
return $body;
}
I need to extract the domain name out of a string which could be anything. Such as:
$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";
or
$sitelink="http://subdomain.somewebsite.com/blah/blah/whatever.php";
In any case, I'm looking to extract the 'somewebsite.com' portion (which could be anything), and discard the rest.
With parse_url($url)
<?php
$url = 'http://username:password#hostname/path?arg=value#anchor';
print_r(parse_url($url));
?>
The above example will output:
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
Using thos values
echo parse_url($url, PHP_URL_HOST); //hostname
or
$url_info = parse_url($url);
echo $url_info['host'];//hostname
here it is
<?php
$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";
$domain_pieces = explode(".", parse_url($sitelink, PHP_URL_HOST));
$l = sizeof($domain_pieces);
$secondleveldomain = $domain_pieces[$l-2] . "." . $domain_pieces[$l-1];
echo $secondleveldomain;
note that this is not probably the behavior you are looking for, because, for hosts like
stackoverflow.co.uk
it will echo "co.uk"
see:
http://publicsuffix.org/learn/
http://www.dkim-reputation.org/regdom-libs/
http://www.dkim-reputation.org/regdom-lib-downloads/ <-- downloads here, php included
2 complexe url
$url="https://www.example.co.uk/page/section/younameit";
or
$url="https://example.co.uk/page/section/younameit";
To get "www.example.co.uk":
$host=parse_url($url, PHP_URL_HOST);
To get "example.co.uk" only
$parts = explode('www.',$host);
$domain = $parts[1];
// ...or...
$domain = ltrim($host, 'www.')
If your url includes "www." or not you get the same end result, i.e. "example.co.uk"
VoilĂ !
You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.) and multilevel subdomains. Regex, parse_url() or string functions will never produce absolutely correct result.
I recomend use TLD Extract. Here example of code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('http://www.somewebsite.com/product/3749875/info/overview.html');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'somewebsite'
$result->getSuffix(); // will return (string) 'com'
$result->getRegistrableDomain(); // will return (string) 'somewebsite.com'
For a string that could be anything, new approach:
function extract_plain_domain($text) {
$text=trim($text,"/");
$text=strtolower($text);
$parts=explode("/",$text);
if (substr_count($parts[0],"http")) {
$parts[0]="";
}
reset ($parts);while (list ($key, $val) = each ($parts)) {
if (!empty($val)) { $text=$val; break; }
}
$parts=explode(".",$text);
if (empty($parts[2])) {
return $parts[0].".".$parts[1];
} else {
$num_parts=count($parts);
return $parts[$num_parts-2].".".$parts[$num_parts-1];
}
} // end function extract_plain_domain
You can use the Utopia Domains library (https://github.com/utopia-php/domains), it will return the domain TLD and public suffix based on Mozilla public suffix list (https://publicsuffix.org), it can be used as an alternative to the currently archived TLDExtract package.
You can use 'parse_url' function to get the hostname from your URL and than use Utopia Domains parser to get the correct TLD and join it together with the domain name:
<?php
require_once './vendor/autoload.php';
use Utopia\Domains\Domain;
$url = 'http://demo.example.co.uk/site';
$domain = new Domain(parse_url($url, PHP_URL_HOST)); // demo.example.co.uk
var_dump($domain->get()); // demo.example.co.uk
var_dump($domain->getTLD()); // uk
var_dump($domain->getSuffix()); // co.uk
var_dump($domain->getName()); // example
var_dump($domain->getSub()); // demo
var_dump($domain->isKnown()); // true
var_dump($domain->isICANN()); // true
var_dump($domain->isPrivate()); // false
var_dump($domain->isTest()); // false
var_dump($domain->getName().'.'.$domain->getSuffix()); // example.co.uk
I want to create a custom form validator to check if my user is sending a youtube url.
I've already created my lib/validator/youtubeValidator.class.php
Then I use it in my MyForm.class.php : new YoutubeValidator(........)
Here is the code :
class YoutubeValidator extends sfValidatorUrl
{
protected function configure($options = array(), $messages = array())
{
$this->addMessage('invalid', 'Veuillez entrer un lien Youtube');
}
protected function doClean($url)
{
$pattern =
'%^# Match any youtube URL
(?:https?://)? # Optional scheme. Either http or https
(?:www\.)? # Optional www subdomain
(?: # Group host alternatives
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com
(?: # Group path alternatives
/embed/ # Either /embed/
| /v/ # or /v/
| /watch\?v= # or /watch\?v=
) # End path alternatives.
) # End host alternatives.
([\w-]{10,12}) # Allow 10-12 for 11 char youtube id.
$%x'
;
$result = preg_match($pattern, $url, $matches);
if (false !== $result)
{
return $matches[1];
}
return false;
if (false !== $result)
{
throw new sfValidatorError($this, 'invalid', array('value' => $value));
}
else
{
return true;
}
}
}
But it does not work at all.
Moreover, it could be great if my validator could check if youtube video does exist.
You probably need to change the last lines to something like this:
$result = preg_match($pattern, $url, $matches);
if (false === $result)
{
throw new sfValidatorError($this, 'invalid', array('value' => $url));
}
return $url;
This will only check if the url submitted by user is a youtube url (if it matches your regular expression). If no, will throw an exception.
UPDATE
-- deleted--
UPDATE 2
class YoutubeValidator extends sfValidatorUrl
{
protected function configure($options = array(), $messages = array())
{
parent::configure($options, $messages);
$this->setMessage('invalid', 'Veuillez entrer un lien Youtube');
}
protected function doClean($value)
{
$pattern = "/(http(s)?:\/\/)?(?:youtu.be\/|v\/|u\/\w\/|embed\/|watch\?v=)([^#\&\?]*).*/";
preg_match($pattern, $value, $matches);
if (empty($matches[3]))
{
throw new sfValidatorError($this, 'invalid', array('value' => $value));
}
return $matches[3];
}
}
I've tested it and seems to be working ok (returning the actual video id when using $form->getValues()).
I've seen a couple different examples on the site but it doesn't get the id out of all the youtube options... as an example the following links don't work with the regex pattern below. any help would be wonderful. Thanks in advance:
It just seems to be this one if a user goes to youtube homepage and clicks on one of the vids there they give this url:
http://www.youtube.com/watch?v=hLSoU53DXK8&feature=g-vrec
my regex puts it in the database as: hLSoU53DXK8-vrec and i need it without -vrec.
// YOUTUBE
$youtube = $_POST['youtube'];
function getYoutubeId($youtube) {
$url = parse_url($youtube);
if($url['host'] !== 'youtube.com' &&
$url['host'] !== 'www.youtube.com'&&
$url['host'] !== 'youtu.be'&&
$url['host'] !== 'www.youtu.be')
return false;
$youtube = preg_replace('~
# Match non-linked youtube URL in the wild. (Rev:20111012)
https?:// # Required scheme. Either http or https.
(?:[0-9A-Z-]+\.)? # Optional subdomain.
(?: # Group host alternatives.
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com followed by
\S* # Allow anything up to VIDEO_ID,
[^\w\-\s] # but char before ID is non-ID char.
) # End host alternatives.
([\w\-]{11}) # $1: VIDEO_ID is exactly 11 chars.
(?=[^\w\-]|$) # Assert next char is non-ID or EOS.
(?! # Assert URL is not pre-linked.
[?=&+%\w]* # Allow URL (query) remainder.
(?: # Group pre-linked alternatives.
[\'"][^<>]*> # Either inside a start tag,
| </a> # or inside <a> element text contents.
) # End recognized pre-linked alts.
) # End negative lookahead assertion.
[?=&+%\w]* # Consume any URL (query) remainder.
~ix',
'$1',
$youtube);
return $youtube;
}
$youtube_id = getYoutubeId($youtube);
$url = "http://www.youtube.com/watch?v=hLSoU53DXK8&feature=g-vrec";
$query_string = array();
parse_str(parse_url($url, PHP_URL_QUERY), $query_string);
$id = $query_string["v"];
Unfortuneately the solution above does not retrieve the Youtube ID for the short url "http://youtu.be". So based on the solutions above I wrote this function:
function get_youtube_id( $youtube_url ) {
$url = parse_url($youtube_url);
if( $url['host'] !== 'youtube.com' &&
$url['host'] !== 'www.youtube.com'&&
$url['host'] !== 'youtu.be'&&
$url['host'] !== 'www.youtu.be')
return '';
if( $url['host'] === 'youtube.com' || $url['host'] === 'www.youtube.com' ) :
parse_str(parse_url($youtube_url, PHP_URL_QUERY), $query_string);
return $query_string["v"];
endif;
$youtube_id = substr( $url['path'], 1 );
if( strpos( $youtube_id, '/' ) )
$youtube_id = substr( $youtube_id, 0, strpos( $youtube_id, '/' ) );
return $youtube_id;
}
$youtube = "theURL";
$query_string = array();
parse_str(parse_url($youtube, PHP_URL_QUERY), $query_string);
$youtube_id = $query_string["v"];
Let's take these URLs as an example:
http://www.youtube.com/watch?v=8GqqjVXhfMU&feature=youtube_gdata_player
http://www.youtube.com/watch?v=8GqqjVXhfMU
This PHP function will NOT properly obtain the ID in case 1, but will in case 2. Case 1 is very common, where ANYTHING can come behind the YouTube ID.
/**
* get YouTube video ID from URL
*
* #param string $url
* #return string YouTube video id or FALSE if none found.
*/
function youtube_id_from_url($url) {
$pattern =
'%^# Match any YouTube URL
(?:https?://)? # Optional scheme. Either http or https
(?:www\.)? # Optional www subdomain
(?: # Group host alternatives
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com
(?: # Group path alternatives
/embed/ # Either /embed/
| /v/ # or /v/
| /watch\?v= # or /watch\?v=
) # End path alternatives.
) # End host alternatives.
([\w-]{10,12}) # Allow 10-12 for 11 char YouTube id.
$%x'
;
$result = preg_match($pattern, $url, $matches);
if (false !== $result) {
return $matches[1];
}
return false;
}
What I'm thinking is that there must be a way where I can just look for the "v=", no matter where it lies in the URL, and take the characters after that. In this manner, no complex RegEx will be needed. Is this off base? Any ideas for starting points?
if (preg_match('/youtube\.com\/watch\?v=([^\&\?\/]+)/', $url, $id)) {
$values = $id[1];
} else if (preg_match('/youtube\.com\/embed\/([^\&\?\/]+)/', $url, $id)) {
$values = $id[1];
} else if (preg_match('/youtube\.com\/v\/([^\&\?\/]+)/', $url, $id)) {
$values = $id[1];
} else if (preg_match('/youtu\.be\/([^\&\?\/]+)/', $url, $id)) {
$values = $id[1];
}
else if (preg_match('/youtube\.com\/verify_age\?next_url=\/watch%3Fv%3D([^\&\?\/]+)/', $url, $id)) {
$values = $id[1];
} else {
// not an youtube video
}
This is what I use to extract the id from an youtube url. I think it works in all cases.
Note that at the end $values = id of the video
Instead of regex. I hightly recommend parse_url() and parse_str():
$url = "http://www.youtube.com/watch?v=8GqqjVXhfMU&feature=youtube_gdata_player";
parse_str(parse_url( $url, PHP_URL_QUERY ), $vars );
echo $vars['v'];
Done
You could just use parse_url and parse_str:
$query_string = parse_url($url, PHP_URL_QUERY);
parse_str($query_string);
echo $v;
I have used the following patterns because YouTube has a youtube-nocookie.com domain too:
'#youtube(?:-nocookie)?\.com/watch[#\?].*?v=([^"\& ]+)#i',
'#youtube(?:-nocookie)?\.com/embed/([^"\&\? ]+)#i',
'#youtube(?:-nocookie)?\.com/v/([^"\&\? ]+)#i',
'#youtube(?:-nocookie)?\.com/\?v=([^"\& ]+)#i',
'#youtu\.be/([^"\&\? ]+)#i',
'#gdata\.youtube\.com/feeds/api/videos/([^"\&\? ]+)#i',
In your case it would only mean to extend the existing expressions with an optional (-nocookie) for the regular YouTube.com URL like so:
if (preg_match('/youtube(?:-nocookie)\.com\/watch\?v=([^\&\?\/]+)/', $url, $id)) {
If you change your proposed expression to NOT contain the final $, it should work like you intended. I added the -nocookie as well.
/**
* get YouTube video ID from URL
*
* #param string $url
* #return string YouTube video id or FALSE if none found.
*/
function youtube_id_from_url($url) {
$pattern =
'%^# Match any YouTube URL
(?:https?://)? # Optional scheme. Either http or https
(?:www\.)? # Optional www subdomain
(?: # Group host alternatives
youtu\.be/ # Either youtu.be,
|youtube(?:-nocookie)?\.com # or youtube.com and youtube-nocookie
(?: # Group path alternatives
/embed/ # Either /embed/
| /v/ # or /v/
| /watch\?v= # or /watch\?v=
) # End path alternatives.
) # End host alternatives.
([\w-]{10,12}) # Allow 10-12 for 11 char YouTube id.
%x'
;
$result = preg_match($pattern, $url, $matches);
if (false !== $result) {
return $matches[1];
}
return false;
}
Another easy way is using parse_str():
<?php
$url = 'http://www.youtube.com/watch?v=8GqqjVXhfMU&feature=youtube_gdata_player';
parse_str($url, $yt);
// The associative array $yt now contains all of the key-value pairs from the querystring (along with the base 'watch' URL, but doesn't seem you need that)
echo $yt['v']; // echos '8GqqjVXhfMU';
?>
The parse_url suggestions are good. If you really want a regex you can use this:
/(?<=v=)[^&]+/`
SOLUTION for any YOUTUBE LINK:
http://youtube.com/v/dQw4w9WgXcQ
http://youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube.com/watch?feature=player&v=dQw4w9WgXcQ&var2=bla
http://youtu.be/dQw4w9WgXcQ
==
https://stackoverflow.com/a/20614061/2165415
Here is one solution
/**
* credits goes to: http://stackoverflow.com/questions/11438544/php-regex-for-youtube-video-id
* update: mobile link detection
*/
public function parseYouTubeUrl($url)
{
$pattern = '#^(?:https?://)?(?:www\.)?(?:m\.)?(?:youtu\.be/|youtube\.com(?:/embed/|/v/|/watch\?v=|/watch\?.+&v=))([\w-]{11})(?:.+)?$#x';
preg_match($pattern, $url, $matches);
return (isset($matches[1])) ? $matches[1] : false;
}
It can deal with mobile links too.
Here is my function for retrieving Youtube ID !
function getYouTubeId($url) {
if (!(strpos($url, 'v=') !== false)) return false;
$parse = explode('v=', $url);
$code = $parse[1];
if (strlen($code) < 11) return false;
$code = substr($code, 0, 11);
return $code;
}