I'm trying to get just the id from a vimeo URL. Is there a simpler way than this? All the vimeo video urls I see are always:
https://vimeo.com/29474908
https://vimeo.com/38648446
// VIMEO
$vimeo = $_POST['vimeo'];
function getVimeoInfo($vimeo)
{
$url = parse_url($vimeo);
if($url['host'] !== 'vimeo.com' &&
$url['host'] !== 'www.vimeo.com')
return false;
if (preg_match('~^http://(?:www\.)?vimeo\.com/(?:clip:)?(\d+)~', $vimeo, $match))
{
$id = $match[1];
}
else
{
$id = substr($link,10,strlen($link));
}
if (!function_exists('curl_init')) die('CURL is not installed!');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://vimeo.com/api/v2/video/$id.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = unserialize(curl_exec($ch));
$output = $output[0];
curl_close($ch);
return $output['id'];
}
$vimeo_id = getVimeoInfo($vimeo);
There are lot many vimeo URLs that are valid. Few examples are
All valid URLs:
http://vimeo.com/6701902
http://vimeo.com/670190233
http://player.vimeo.com/video/67019023
http://player.vimeo.com/video/6701902
http://player.vimeo.com/video/67019022?title=0&byline=0&portrait=0
http://player.vimeo.com/video/6719022?title=0&byline=0&portrait=0
http://vimeo.com/channels/vimeogirls/6701902
http://vimeo.com/channels/vimeogirls/67019023
http://vimeo.com/channels/staffpicks/67019026
http://vimeo.com/15414122
http://vimeo.com/channels/vimeogirls/66882931
All invalid URLs:
http://vimeo.com/videoschool
http://vimeo.com/videoschool/archive/behind_the_scenes
http://vimeo.com/forums/screening_room
http://vimeo.com/forums/screening_room/topic:42708
I wrote this java regex that catches all the above valid URLs and rejects the invalid ones. I m not sure though if they vimeo has more valid URLs.
(https?://)?(www.)?(player.)?vimeo.com/([a-z]*/)*([0-9]{6,11})[?]?.*
Hope this helps...
I think using parse_url() is the best option:
$vimeo = 'https://vimeo.com/29474908';
echo (int) substr(parse_url($vimeo, PHP_URL_PATH), 1);
For those of you who want to see the code fully implemented using PHP, I am using the regex provided by user2200660 and formatted for PHP by Morgan Delaney, here it is:
$vimeo = 'http://player.vimeo.com/video/67019023';
if(preg_match("/(https?:\/\/)?(www\.)?(player\.)?vimeo\.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*/", $vimeo, $output_array)) {
echo "Vimeo ID: $output_array[5]";
}
//outputs: Vimeo ID: 67019023
[Edit] You can now do this all via the API!
If you provide a comma separated list of your Vimeo urls via the "links" parameter to the search endpoint (https://developer.vimeo.com/api/endpoints/videos#GET/videos) we will return those videos as API responses.
e.g.
GET https://api.vimeo.com/videos?links=https://vimeo.com/74648232,https://vimeo.com/232323497
[Original]
Vimeo provides many different type of video urls, some of which do not include the id. To ensure support across all of Vimeo's urls you should ask vimeo directly for the ID.
You can ask vimeo via the oEmbed endpoint.
There are many options, but the easiest option is to make an HTTP GET request to the url https://vimeo.com/api/oembed.json?url={vimeo_url}, replacing {vimeo_url} with the appropriate url.
For example, to get the ID of the url you provided above (https://vimeo.com/29474908) make an HTTP GET request to
https://vimeo.com/api/oembed.json?url=https://vimeo.com/29474908
Parse the JSON response, and grab the video_id parameter.
This should retrieve the ID from all kinds of vimeo urls.
$url = 'https://vimeo.com/cool/29474908?title=0&byline=0&portrait=0';
$urlParts = explode("/", parse_url($url, PHP_URL_PATH));
$videoId = (int)$urlParts[count($urlParts)-1];
A current, working regex:
function getIdFromVimeoURL(url) {
return /(vimeo(pro)?\.com)\/(?:[^\d]+)?(\d+)\??(.*)?$/.exec(url)[3];
}
console.log(getIdFromVimeoURL("https://vimeo.com/channels/staffpicks/272053388"))
console.log(getIdFromVimeoURL("https://vimeo.com/272053388"))
console.log(getIdFromVimeoURL("https://player.vimeo.com/video/272053388"))
// ...etc.
If someone need it in JavaScript based on #user2200660 answer:
function getVimeoVideoId(url){
var regex = new RegExp(/(https?:\/\/)?(www.)?(player.)?vimeo.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*/);
if ( regex.test(url) ) {
return regex.exec(url)[5];
}
}
If you only need the Vimeo ID, you can use the RegExp non-capturing groups:
(?:https?:\/\/)?(?:www\.)?vimeo\.com\/(?:(?:[a-z0-9]*\/)*\/?)?([0-9]+)
A lot of good answers here, specifically #user2200660.
https://stackoverflow.com/a/16841070/3850405
However a use case that has not been supported in the previous answers is this:
https://vimeo.com/showcase/7008490/video/407943692
Regex that can handle it and the other examples:
(https?:\/\/)?(www\.)?(player\.)?vimeo\.com\/?(showcase\/)*([0-9))([a-z]*\/)*([0-9]{6,11})[?]?.*
https://regex101.com/r/p2Kldc/1/
$vimeo = 'http://player.vimeo.com/video/67019023';
if(preg_match("/(https?:\/\/)?(www\.)?(player\.)?vimeo\.com\/?(showcase\/)*([0-9))([a-z]*\/)*([0-9]{6,11})[?]?.*/", $vimeo, $output_array)) {
echo "Vimeo ID: $output_array[6]";
}
Credits to #zeckdude for the original example code in PHP.
https://stackoverflow.com/a/29860052/3850405
In 2022, this is still the one to go with for Vimeo videos:
https://gist.github.com/anjan011/1fcecdc236594e6d700f
(Tested on all the faulty url's given in the comments as well.)
Related
I am attempting to convert an image url provided by the facebook api into base64 format with cURL.
the api provides a url as such:
https://fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xfp1/v/t1.0-9/p180x540/72099_736078480783_68792122_n.jpg?oh=f3698c5eed12c1f2503b147d221f39d1&oe=54C5BA4E&__gda__=1418090980_c7af12de6b0dd8abe752f801c1d61e0d
The issue is that the url only works with the oh, oe and gda parameters included in the url string, there is no direct img url. Removing the params send you to a facebook error page.
With the parameterized url my curl_exec is not getting correct image data. Is there a way to get the base64 data from facebook, or is there something I can do to get access the pure image url given the parameterized url?
This is what my decode scrip looks like:
header('Access-Control-Allow-Origin: *');
$url = $_GET['url'];
try {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 3);
$result = curl_exec($c);
curl_close ($c);
if(false===$result) {
echo 'fail';
} else {
$base64 = "data:image/jpeg;charset=UTF-8;base64,".base64_encode($result);
echo $base64;
}
} catch ( \ErrorException $e ) {
echo 'fail';
}
To address your specific problem, your script is likely failing because the required oh, oe, __gda__ parameters are getting separated during the GET request and therefore are not included in $_GET['url'].
Make sure you're using a URL-encoded string so any unencoded & characters aren't handled as delimiters. Then just decode the string before passing it on to cURL.
...
$url = urldecode($_GET['url']);
...
For anyone curious, you can still load any Facebook image from any one of their legacy CDNs without needing the new parameters:
https://scontent-a-iad.xx.fbcdn.net/hphotos-frc3/
https://scontent-b-iad.xx.fbcdn.net/hphotos-frc3/
https://scontent-c-iad.xx.fbcdn.net/hphotos-frc3/
Just append the original image filename to the URL et voila.
Disclaimer: I have no idea how long this little trick will work for so don't use it on anything important in production.
Maybe this won't help much but it seems that the original picture (ending with _o) does not need gda nor oe oh parameters
to get the original profile picture you can do:
var username_or_id = "name.lastname" //Example
get_url ("http://graph.facebook.com/$username_or_id/picture?width=9999")
hth
I had similar problem. My solution:
$url = urldecode($url);
return base64_encode(file_get_contents($url));
Where the URL is to Graph API:
https://graph.facebook.com/$user_id/picture?width=160
(You probably want to also check, if file_get_contents returns something)
You just need to add the CURLOPT_SSL_VERIFYPEER set to false as the url from facebook is https and not http., or you could just as well request the url without ssl by replacing https with http.
Try the code below
$url = 'https://fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xfp1/v/t1.0-9/p180x540/72099_736078480783_68792122_n.jpg?oh=f3698c5eed12c1f2503b147d221f39d1&oe=54C5BA4E&__gda__=1418090980_c7af12de6b0dd8abe752f801c1d61e0d';
try {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 3);
/***********************************************/
// you need the curl ssl_opt_verifypeer
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, false);
/***********************************************/
$result = curl_exec($c);
curl_close ($c);
if(false===$result) {
echo 'fail';
} else {
$base64 = '<img alt="Embedded Image" src="data:image/jpeg;charset=UTF-8;base64,'.base64_encode($result).'"/>';
echo $base64;
}
}
catch ( \ErrorException $e ) {
echo 'fail';
}
Search on Google images with car keyword & get car images.
I found two links to implement like this,
PHP class to retrieve multiple images from Google using curl multi
handler
Google image API using cURL
implement also but it gave 4 random images not more than that.
Question: How to get car images in PHP using keyword i want to implement like we search on Google?
Any suggestion will be appreciated!!!
You could use the PHP Simple HTML DOM library for this:
<?php
include "simple_html_dom.php";
$search_query = "ENTER YOUR SEARCH QUERY HERE";
$search_query = urlencode( $search_query );
$html = file_get_html( "https://www.google.com/search?q=$search_query&tbm=isch" );
$image_container = $html->find('div#rcnt', 0);
$images = $image_container->find('img');
$image_count = 10; //Enter the amount of images to be shown
$i = 0;
foreach($images as $image){
if($i == $image_count) break;
$i++;
// DO with the image whatever you want here (the image element is '$image'):
echo $image;
}
This will print a specific number of images (number is set in '$image_count').
For more information on the PHP Simple HTML DOM library click here.
i am not very much sure about this ,but still google gives a nice documentation about this.
$url = "https://ajax.googleapis.com/ajax/services/search/images?" .
"v=1.0&q=barack%20obama&userip=INSERT-USER-IP";
// sendRequest
// note how referer is set manually
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, /* Enter the URL of your site here */);
$body = curl_exec($ch);
curl_close($ch);
// now, process the JSON string
$json = json_decode($body);
// now have some fun with the results...
this is from the official Google's developer guide regarding image searching.
for more reference you can have a reference of the same here.
https://developers.google.com/image-search/v1/jsondevguide#json_snippets_php
in $url you must set the search keywords.
I'm a beginner at PHP. I have one task in my project, which is to fetch all videos from a YouTube link using curl in PHP. Is it possible to show all videos from YouTube?
I found this code with a Google search:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.youtube.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec ($ch);
echo $contents;
curl_close ($ch);
?>
It shows the YouTube site, but when I click any video it will not play.
You can get data from youtube oemebed interface in two formats Xml and Json which returns metadata about a video:
http://www.youtube.com/oembed?url={videoUrlHere}&format=json
Using your example, a call to:
http://www.youtube.com/oembed?url=http://www.youtube.com/watch?v=B4CRkpBGQzU&format=json
So, You can do like this:
$url = "Your_Youtube_video_link";
Example :
$url = "http://www.youtube.com/watch?v=m7svJHmgJqs"
$youtube = "http://www.youtube.com/oembed?url=" . $url. "&format=json";
$curl = curl_init($youtube);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$return = curl_exec($curl);
curl_close($curl);
$result = json_decode($return, true);
echo $result['html'];
Try it...Hope it will help you.
You could use curl to retrieve the Google main page (or an alternative page) and parse the returned html using a library such as html5lib. If you wanted to try this approach the first step could be to 'view source' on the relevant page and look at how the links are structured.
A more elegant way to approach the problem could be to use the Youtube API (a way to interact with the Youtube system), which may allow you to retrieve the links directly. e.g it may be possible to just ask the Youtube API to send you the links. Try this.
You can also get all youtube's channel videos using file_get_contents
bellow is sample and working code
<?php
$Youtube_API_Key = ""; // you can obtain api key : https://developers.google.com/youtube/registering_an_application
$Youtube_channel_id = "";
$TotalVideso = 50; // 50 is max , if you want more video you need Youtube secret key.
$order= "date"; ////allowed order : date,rating,relevance,title,videocount,viewcount
$url = "https://www.googleapis.com/youtube/v3/search?key=".$Youtube_API_Key."&channelId=".$Youtube_channel_id."&part=id&order=".$order."&maxResults=".$TotalVideso."&format=json";
$data = file_get_contents($url);
$JsonDecodeData=json_decode($data, true);
print_r($data);
?>
Here is a function that validates .edu TLD and checks that the url does not point to a .pdf document or a .doc document.
public function validateEduDomain($url) {
if( preg_match('/^https?:\/\/[A-Za-z]+[A-Za-z0-9\.-]+\.edu/i', $url) && !preg_match('/\.(pdf)|(doc)$/i', $url) ) {
return TRUE;
}
return FALSE;
Now I am encountering links that point to jpg, rtf and others that simple_html_dom tries to parse and return its content. I want to avoid this happening by skipping all such links. The problem is that the list is non-exhaustive and I want the code to skip all such links. How am I supposed to do that??
Tring to filter urls by guessing what's behind it will always fail in a number of cases. Assuming you are using curl to download, you should check if the response document-type header is among the acceptable ones:
<?php
require "simple_html_dom.php";
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); //default is to output it
$urls = array(
"google.com",
"https://www.google.com/logos/2012/newyearsday-2012-hp.jpg",
"http://cran.r-project.org/doc/manuals/R-intro.pdf",
);
$acceptable_types = array("text/html", "application/xhtml+xml");
foreach ($urls as $url) {
curl_setopt($curl, CURLOPT_URL, $url);
$contents = curl_exec($curl);
//we need to handle content-types like "text/html; charset=utf-8"
list($response_type) = explode(";", curl_getinfo($curl, CURLINFO_CONTENT_TYPE));
if (in_array($response_type, $acceptable_types)) {
echo "accepting {$url}\n";
// create a simple_html_dom object from string
$obj = str_get_html($contents);
} else {
echo "rejecting {$url} ({$response_type})\n";
}
}
running the above results in:
accepting google.com
rejecting https://www.google.com/logos/2012/newyearsday-2012-hp.jpg (image/jpeg)
rejecting http://cran.r-project.org/doc/manuals/R-intro.pdf (application/pdf)
Update the last regex to something like this:
!preg_match('/\.(pdf)|(doc)|(jpg)|(rtf)$/i', $url) )
Will filter out the jpgs and rtf documents.
You have to add the extensions to the regex above to omit them.
Update
I don’t think its possible to block all sort of extensions and I personally do not recommend it for scraping usage also. You will have to skip some extensions to keep crawling. Why dont you change you regex filter to the ones you would like to accept like:
preg_match('/\.(html)|(html)|(php)|(aspx)$/i', $url) )
how to find the total no.of inbound and outbound links of a website using php?
To count outbound links
parse html for webpage
parse all links using regex
filter links which starts with your domain or "/"
To inbound link
Grab google results page
http://www.google.ca/search?sourceid=chrome&ie=UTF-8&q=site:
parse similarly
For outbound links, you will have to parse the HTML code of the website as some here have suggested.
For inbound links, I suggest using the Google Custom Search API, sending a direct request to google can get your ip banned. You can view the search api here. Here is a function I use in my code for this api:
function doGoogleSearch($searchTerm)
{
$referer = 'http://your-site.com';
$args['q'] = $searchTerm;
$endpoint = 'web';
$url = "http://ajax.googleapis.com/ajax/services/search/".$endpoint;
$args['v'] = '1.0';
$key= 'your-api-key';
$url .= '?'.http_build_query($args, '', '&');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
$body = curl_exec($ch);
curl_close($ch);
//decode and return the response
return json_decode($body);
}
After calling this function as: $result = doGoogleSearch('link:site.com'), the variable $result->cursor->estimatedResultCount will have the number of results returned.
PHP can't determine the inbound links of a page through some trivial action. You either have to monitor all incoming visitors and check what their referrer is, or parse the entire internet for links that point to that site. The first method will miss links not getting used, and the second method is best left to Google.
On the other hand, the outbound links from a site is doable. You can read in a page and analyze the text for links with a regular expression, counting up the total.
function getGoogleLinks($host)
{
$request = "http://www.google.com/search?q=" . urlencode("link:" . $host) ."&hl=en";
$data = getPageData($request);
preg_match('/<div id=resultStats>(About )?([\d,]+) result/si', $data, $l);
$value = ($l[2]) ? $l[2] : "n/a";
$string = "" . $value . "";
return $string;
}
//$host means the domain name