I have php code that correctly retrieves, using the YouTube api, the title, video url, viewcount, video date, last comment date, and the first 160 characters of the description. I can't seem to figure out how to get the entire description. I know it is there in the xml retrieved, because I have dumped that. So how come I am only getting 160 chars?
The entire description is truncated at 157 chars, and "..." is added, so that by the time I echo it or var_dump it, it is 160 chars. Here is my complete test code (without title, video url, etc etc).
<?php
$feedURL = 'http://gdata.youtube.com/feeds/api/videos?q=phone&v=2&fields=entry[yt:statistics/#viewCount > 10000]&start-index=1&max-results=1';
$sxml = simplexml_load_file($feedURL);
foreach ($sxml->entry as $entry) {
$media = $entry->children('http://search.yahoo.com/mrss/');
echo $media->group->description;
}
?>
This is what displays on the page:
FREE TuTiTu's Games: http://www.tutitu.tv/index.php/games FREE TuTiTu's Coloring pages at: http://www.tutitu.tv/index.php/coloring Join us on Facebook: https...
When I get the xml this way:
gdata.youtube.com/feeds/api/videos/JI-5kh_4gO0?v=2&alt=json-in-script&callback=youtubeFeedCallback&prettyprint=true
The entire description looks like this:
"media$description": {
"$t": "FREE TuTiTu's Games: http://www.tutitu.tv/index.php/games\nFREE TuTiTu's Coloring pages at: http://www.tutitu.tv/index.php/coloring\nJoin us on Facebook: https://www.facebook.com/TuTiTuTV\nTuTiTu's T-Shirts: http://www.zazzle.com/TuTiTu?rf=238778092083495163\n\nTuTiTu - The toys come to life\n\nTuTiTu - \"The toys come to life\" is a 3D animated television show targeting 2-3 year olds. Through colorful shapes TuTiTu will stimulate the children's imagination and creativity. On each episode TuTiTu's shapes will transform into a new and exciting toy.",
"type": "plain"
},
I'm sure I am missing something basic, but when I've looked for a solution, I have not found it.
Thanks for any help.
These 2 different types of API requests will return a different description size.
I assume it's a way to limit the total response size.
1) doing a search as in: http://gdata.youtube.com/feeds/api/videos?q=phone&v=2&fields=entry&alt=json&prettyprint=true will return the short video description.
2) doing a video request as in: http://gdata.youtube.com/feeds/api/videos/JI-5kh_4gO0?v=2&alt=json&prettyprint=true will return the long video description.
BTW: api version 3 will allow you to request a list of video id's in 1 request (to get their long descriptions).
$media->group->{'media$description'} should do the trick
Related
I volunteer in a communal library and I'm in charge of the digital transition.
I'm using the free and open-source software PMB and I want to automate the retrieval of book titles with the Knowledge Graph API (which is not possible with PMB, or I missed something).
Why to use Knowledge Graph instead of ISBNdb or another free ISBN API ? Because none is as complete and qualitative as KG.
For example: I take the ISBN of a French book : 9782884613736 ("Le foot illustré de A à Z").
Not found on ISBNdb.com, etc.
So, on when I google it, the Knowledge Graph returns me exactly what I want :
> Screenshot of what I see
But when i'm using the API :
GET https://kgsearch.googleapis.com/v1/entities:search?languages=fr&query=9782884613736&types=Book&key={YOUR_API_KEY}
{
"#context": {
"#vocab": "http://schema.org/",
"goog": "http://schema.googleapis.com/",
"EntitySearchResult": "goog:EntitySearchResult",
"detailedDescription": "goog:detailedDescription",
"kg": "http://g.co/kg"
},
"#type": "ItemList",
"itemListElement": [
]
}
Nothing returned to my GET request. (It works properly if I request the book title, it returns well the informations)
I tried with different types according to schema.org : Book, BookSeries, BookFormatType.
Is there a way to use KG API as I want ?
I'm totally open to all suggestions (even to use another method to reach my aim).
Thank you.
The ISBN seems to be wrong or doesn't exits on google DB, here's a sample using google books api:
ISBN_10 : 2884610154
https://www.googleapis.com/books/v1/volumes?languages=fr&q=isbn:2884610154
ISBN_13 : 9782884610155
https://www.googleapis.com/books/v1/volumes?languages=fr&q=isbn:9782884610155
When I am performing Open Graph requests, some of the responses that I am expecting to be text are having some kind of markup included. For example, when I am requesting the Name and Description of an album, in the description I get something like \u0040[12412421421:124:The Link]. (The \u0040 is actually the # sign.)
In this case it seems that what it is saying is that the 'The Link' should be a hyperlink to a facebook page with ID 12412421421. I presume there is similar kind of markup for hashtags and external URLs.
I am trying to find some official documentation or description for this, but I can't seem to find any documentation of this (I might be looking with the wrong keywords).
Is there any online documentation that describes this? And better still is there an PHP library or function already available somewhere that converts this text into its HTML equivalent?
I am using this Facebook PHP SDK, but it doesn't seem to offer any such function. (Not sure if there is anything in the new version 4.0 one but I can't use it anyway for now because it requres PHP 5.4+ and my host currently is still on 5.3.).
It's true that the PHP SDK doesn't provide anything to deal with these links and the documentation doesn't document that either. However the API gives all the information you need in the description field itself, so here is what you could do:
$description = "Live concert with #[66961492640:274:Moonbootica] "
. "in #[106078429431815:274:London, United Kingdom]! #music #house";
function get_html_description($description) {
return
// 1. Handle tags (pages, people, etc.)
preg_replace_callback("/#\[([0-9]*):([0-9]*):(.*?)\]/", function($match) {
return ''.$match[3].'';
},
// 2. Handle hashtags
preg_replace_callback("/#(\w+)/", function($match) {
return ''.$match[0].'';
},
// 3. Handle breaklines
str_replace("\n", "<br />", $description)));
}
// Display HTML
echo get_html_description($description);
While 2. and 3. handle hashtags and breaklines, the part 1. of the code basically splits up the tag #[ID:TYPE:NAME] into 3 groups of information (id, type, name) before generating HTML links from the page IDs and names:
Live concert with Moonbootica in London, United Kingdom! #music #house
Live concert with Moonbootica in London, United Kingdom!
#music #house
FYI and even if it's not much useful, here are the meanings of the types:
an app (128),
a page (274),
a user (2048).
the # describes a tag to someone,facebook id doesnt make difference between a fanpage or a single person so you gotta deal with php only.and the # should be the only char that describes a person/page tagged
The markup is used to reference a fanpage.
Example:
"description": "Event organised by #[303925999750490:274:World Next Top Model MALTA]\nPhotography by #[445645795469650:274:Pixbymax Photography]"
The 303925999750490 is the fanpage ID. The World Next Top Model MALTA is the name of fanpage. (Don't know what the 274 means)
When you render this on your page, you can render like this:
Event organised by World Next Top Model MALTA
Photography by Pixbymax Photography
I am working on facebook open graph api where user can post the videos in their timeline.
Now the page where we play the video we call it video play page we generate the meta properties dynamically. The process goes as
The production team adds video in the database and each video has a property called "title".
In addition there is a separate db table and we call it meta_seo where for each video SEO related information is added. For example title, description etc.
Now while generating the meta information for the page we first check if there is some information on the meta_seo table and if found we generate the meta tags else retrieve them from the video object which will also contain some information so FB og:title we have some information about the video title.
The posting on FB timeline using open graph api works pretty well.
But there are some changes needed on the title being posted.
The title of the videos can be in this format which is stored in the database.
video title1
video title2
video title - mysite.com
video title | MySite.com
So when we post a video to FB timeline some of the videos may not have the mysite.com at the very end of the title. But we need that mysite.com to be appended always.
So based on above possibilities I have written a script as
$haystack1 = "Title1- Online class - mysite.com";
$haystack2 = "Title2 - Online Class";
$needle = 'mysite.com';
if(strripos($haystack1,$needle) === false){
echo $haystack1.' | '.'MySite.com';
}else{
echo $haystack1 ;
}
if(strripos($haystack2,$needle) === false){
echo $haystack2.' | '.'MySite.com';
}else{
echo $haystack2 ;
}
The above code works pretty well. But my question is, is there a better way to achieve this or I can do it the way it I pointed out above.
I am not tagging this to Facebook, since its nothing to do with Facebook and no issues what so ever posting on FB or graph api / open graph.
You can use regex for more flexability:
if (preg_match('/\s*(-|\|)?\s*'.str_replace('.', '\.', $needle).'\s*$/i', $haystack1)) {
// do something if exists
echo $haystack1;
} else {
echo $haystack1.' | '.'MySite.com';
}
So script will output:
video title1 => video title1 | MySite.com
video title2 => video title2 | MySite.com
video title - mysite.com => video title - mysite.com
video title | MySite.com => video title | MySite.com
Is it that you need? Or you want to replase - mysite.com by | MySite.com too?
I recommend to use an array for store haystacks and then use my answer above in foreach.
I'm using 'Simple HTML Dom' to scrape the HN Front Page (news.ycombinator.com), which works great most of the time.
However, every now and then they promote a job/company that lacks the elements that the scraper is looking for, i.e. score, username and number of comments.
This of course, breaks the array and thus the output of my script:
<?php
// 2012-02-12 Maximilian (Extract news.ycombinator.com's Front Page)
// Set the header during development
//header ("content-type: text/xml");
// Call the external PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/manual.htm)
include('lib/simple_html_dom.php');
date_default_timezone_set('Europe/Berlin');
// Download 'news.ycombinator.com' content
//$tmp = file_get_contents('http://news.ycombinator.com');
//file_put_contents('get.tmp', $tmp);
// Retrieve the content
$html = file_get_html('tc.tmp');
// Set the extraction pattern for each item
$title = $html->find("tr td table tr td.title a");
$score = $html->find("tr td.subtext span");
$user = $html->find("tr td.subtext a[href^=user]");
$link = $html->find("tr td table tr td.title a");
$time = $html->find("tr td.subtext");
$additionals = $html->find("tr td.subtext a[href^=item?id]");
// Construct the feed by looping through the items
for($i=0;$i<29;$i++) {
$cr=1;
// Check if the item points to an external website
if (!strstr($link[$i]->href,'http')) {
$url = 'http://news.ycombinator.com/'.$link[$i]->href;
$description = "Join the discussion on Hacker News.";
} else {
$url = $link[$i]->href;
// Getting content here
if (empty($abstract)) {
$description ="Failed to load any relevant content. Please try again later.";
} else {
$description = $abstract;
}
}
// Put all the items together
$result .= '<item><id>f'.$i.'</id><title>'.htmlspecialchars(trim($title[$i]->plaintext)).'</title><description><![CDATA['.$description.']]></description><pubDate>'.str_replace(' | '.$additionals[$i]->plaintext,'',str_replace($score[$i]->plaintext.' by '.$user[$i]->plaintext.' ','',$time[$i]->plaintext)).'</pubDate><score>'.$score[$i]->plaintext.'</score><user>'.$user[$i]->plaintext.'</user><comments>'.$additionals[$i]->plaintext.'</comments><id>'.substr($additionals[$i]->href,8).'</id><discussion>http://news.ycombinator.com/'.$additionals[$i]->href.'</discussion><link>'.htmlspecialchars($url).'</link></item>';
}
$output = '<rss><channel><id>news.ycombinator.com Frontpage</id><buildDate>'.date('Y-m-d H:i:s').'</buildDate>'.$result.'</channel></rss>';
file_put_contents('tc.xml', $output);
?>
Here's an example of the correct output
<item>
<id>f0</id>
<title>Show HN: Bootswatch, free swatches for your Bootstrap site</title>
<description><![CDATA[Easy to Install Simply download the CSS file from the swatch of your choice and replace the one in Bootstrap. No messing around with hex values. Whole New Feel We've all been there with the black bar and blue buttons. See how a splash of color and typography can transform the feel of your site. Modular Changes are contained in just two LESS files, enabling modification and ensuring forward compatibility.]]></description>
<pubDate>3 hours ago</pubDate>
<score>196 points</score>
<user>parkov</user>
<comments>30 comments</comments>
<id>3594540</id>
<discussion>http://news.ycombinator.com/item?id=3594540</discussion>
<link>http://bootswatch.com</link>
</item>
<item>
<id>f1</id>
<title>Louis CK inspires Jim Gaffigan to sell comedy special for $5 online</title>
<description><![CDATA[Dear Internet Friends,Inspired by the brilliant Louis CK, I have decided to debut my all-new hour stand-up special on my website, Jimgaffigan.com.Beginning sometime in April, “Jim Gaffigan: Mr. Universe” will be available exclusively for download for only $5. A dollar from each download will go directly to The Bob Woodruff Foundation; a charity dedicated to serving injured Veterans and their families.I am confident that the low price of my new comedy special and the fact that 20% of each $5 download will be donated to this very noble cause will prevent people from stealing it. Maybe I’m being naïve, but I trust you guys.]]></description>
<pubDate>57 minutes ago</pubDate>
<score>25 points</score>
<user>rkudeshi</user>
<comments>4 comments</comments>
<id>3595285</id>
<discussion>http://news.ycombinator.com/item?id=3595285</discussion>
<link>http://www.whosay.com/jimgaffigan/content/218011</link>
</item>
And here's an example of incorrect output. Note that the elements are not empty, thus I cannot seem to catch the error and simply jump to the next item. Everything past the promotion post will break:
<item>
<id>f14</id>
<title>Build the next Legos: We're hiring an iOS Developer & Web Developer (YC S11)</title>
<description><![CDATA[Interested in building the next generation of toys on digital devices such as the iPad? That’s what we’re doing here at Launchpad Toys with apps like Toontastic (Named one of the “Top 10 iPad Apps of 2011” by the New York Times and was recently added to the iTunes Hall of Fame) and an awesom]]><![CDATA[e suite of others we have under development. We’re looking for creative and playful coders that have made games or highly visual apps/sites in the past for our two open development positions. As a kid, you probably played with Legos endlessly and grew up to be a hacker because you still love building things. Sounds like you? Email us at howdy#launchpadtoys.com with a couple links to some projects and code that we can look at along with your resume.]]></description>
<pubDate>2 hours ago</pubDate>
<score>14 points</score>
<user>bproper</user>
<comments>7 comments</comments>
<id>3594944</id>
<discussion>http://news.ycombinator.com/item?id=3594944</discussion>
<link>http://launchpadtoys.com/blog/2012/02/iosdeveloper-webdeveloper/</link>
</item>
<item>
<id>f15</id>
<title>SOPA foe Fred Wilson supports a blacklist on pirate sites</title>
<description><![CDATA[VC Fred Wilson says Google, Bing, Facebook, and Twitter should warn people when they try to log in at known pirate sites: "We don't need legislation." Fred Wilson says: If they try to pass antipiracy legislation, it will once again be 'war.' (Credit: Greg Sandoval/CNET) Fred Wilson, a well-known ven]]><![CDATA[ture capitalist from New York, says he's in favor of creating a blacklist for Web sites found to traffic in pirated films, music, and other intellectual property. The co-founder of Union Square Ventures told a gathering of media executives at the Paley Center for Media yesterday that he believes a good antipiracy measure would be for Google, Twitter, Facebook, and other major sites to issue warnings to people when they try to connect with a known pirate site. Fred Wilson, a co-founder of Union Square Ventures, says 'Our children have been taught to steal.' (Credit: Union Square Ventures) Wilson favors establishing an independent group to create a "black and white list." "The blacklist are those sites we all know are bad news," he told the audience in New York.]]></description>
<pubDate>14 points by bproper 2 hours ago | 7 comments</pubDate>
<score>24 points</score>
<user>andrewcross</user>
<comments>12 comments</comments>
<id>3594558</id>
<discussion>http://news.ycombinator.com/item?id=3594558</discussion>
<link>http://news.cnet.com/8301-31001_3-57377862-261/post-sopa-influential-tech-investor-favors-blacklisting-pirate-sites/</link>
</item>
So here's my question: How can I handle a situation where a particular element is missing and find() doesn't throw an error? Do I have to start from scratch, or is there a better approach in scraping the HN front page?
For anyone curious, here's the whole XML file: http://thequeue.org/api/tc.xml
You have to work by chunks in order to handle that, there seems to be a dummy spacer element that can help you with that:
$news = preg_split('/<tr style="height:5px"><\/tr>/',$html->find('tbody',2)->innertext);
And then use subselectors:
foreach($news as $article){
$article = str_get_html($article)
// No upvote arrow found so its not a valid article
if(count($article->find('img')) === 0){
continue;
}
}
And for the other elements you use the same selectors
We'll thanks to Ivan's trail of thought, I am now splitting the initially scraped HTML into an array, each node representing a post. Then, going through every single post in a loop, I'll check if the up voting arrow image exists. If not, I'll not add it to the result. In the end everything will be stitched back together and the sponsored post is left out. Here's the code:
$array = explode('<tr style="height:5px"></tr>',$html);
foreach ($array as $post) {
if (!strstr($post,'grayarrow.gif')){}else{
$clean .= $post;
}
}
unset($array);
$html = str_get_html($clean.'</body></html>');
I am using the Google Analytics PHP class to get data from Google Analytics.
http://code.google.com/p/gapi-google-analytics-php-interface/wiki/GAPIDocumentation
I would like to get a report of "Bounce Rate" For "Top Contnet".
The thing is I am not familiar with the terminology.
When I am trying to get a "content" report or "topcontent" or "top_content" it says that there in no such metric. I simply don't know the right expressions.
Does anyone know where can I find a list of all expressions? metrics & dimensions?
Thanks.
Top content isn't a metric, it's just a list of the pages on your site with the highest number of page views.
The metric you're looking for is 'entranceBounceRate' and the dimension is 'pagePath'. You want to get the bounce rate for the top X most visited pages on your site, so you'll want to limit your results and sort the results by '-pageviews' (pageviews descending).
If you want to get the bounce rate for the top 10 most viewed pages on your site, your query should look like this:
$ga = new gapi('email#yourdomain.com','password');
$ga->requestReportData(145141242,array('pagePath'),array('entranceBounceRate','pageviews'),array('-visits'),null,null,null,10);
The Google Analytics Export API has a data feed query explorer that should help you out considerably when using GAPI:
http://code.google.com/apis/analytics/docs/gdata/gdataExplorer.html
Also, here's a list of all available dimensions and metrics you can pull from the API:
http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDimensionsMetrics.html
Definitely read over the GAPI documentation:
http://code.google.com/p/gapi-google-analytics-php-interface/wiki/GAPIDocumentation
If you would like to get the global Bounce Rate for the last 30days (by default), here is how. Very simple once you know it.
//Check Bounce Rate for the last 30 days
$ga = new gapi(ga_email, ga_password);
$ga->requestReportData(145141242, NULL ,array('bounces', 'visits'));
$data = round(($ga->getBounces() / $ga->getVisits()) * 100) . "%";
Note that the GAPI has a bug, they mention the dimension parameter is optional (2nd parameter) but it's not. You have to open the gapi.class.php file and patch line 128 with this:
//Patch bug to make 2nd parameter optional
if( !empty($dimensions) ) {
$parameters['dimensions'] = 'ga:'.$dimensions;
} else {
$parameters['dimensions'] = '';
}