Specification of mark-up format included in facebook open graph text - php

When I am performing Open Graph requests, some of the responses that I am expecting to be text are having some kind of markup included. For example, when I am requesting the Name and Description of an album, in the description I get something like \u0040[12412421421:124:The Link]. (The \u0040 is actually the # sign.)
In this case it seems that what it is saying is that the 'The Link' should be a hyperlink to a facebook page with ID 12412421421. I presume there is similar kind of markup for hashtags and external URLs.
I am trying to find some official documentation or description for this, but I can't seem to find any documentation of this (I might be looking with the wrong keywords).
Is there any online documentation that describes this? And better still is there an PHP library or function already available somewhere that converts this text into its HTML equivalent?
I am using this Facebook PHP SDK, but it doesn't seem to offer any such function. (Not sure if there is anything in the new version 4.0 one but I can't use it anyway for now because it requres PHP 5.4+ and my host currently is still on 5.3.).

It's true that the PHP SDK doesn't provide anything to deal with these links and the documentation doesn't document that either. However the API gives all the information you need in the description field itself, so here is what you could do:
$description = "Live concert with #[66961492640:274:Moonbootica] "
. "in #[106078429431815:274:London, United Kingdom]! #music #house";
function get_html_description($description) {
return
// 1. Handle tags (pages, people, etc.)
preg_replace_callback("/#\[([0-9]*):([0-9]*):(.*?)\]/", function($match) {
return ''.$match[3].'';
},
// 2. Handle hashtags
preg_replace_callback("/#(\w+)/", function($match) {
return ''.$match[0].'';
},
// 3. Handle breaklines
str_replace("\n", "<br />", $description)));
}
// Display HTML
echo get_html_description($description);
While 2. and 3. handle hashtags and breaklines, the part 1. of the code basically splits up the tag #[ID:TYPE:NAME] into 3 groups of information (id, type, name) before generating HTML links from the page IDs and names:
Live concert with Moonbootica in London, United Kingdom! #music #house
Live concert with Moonbootica in London, United Kingdom!
#music #house
FYI and even if it's not much useful, here are the meanings of the types:
an app (128),
a page (274),
a user (2048).

the # describes a tag to someone,facebook id doesnt make difference between a fanpage or a single person so you gotta deal with php only.and the # should be the only char that describes a person/page tagged

The markup is used to reference a fanpage.
Example:
"description": "Event organised by #[303925999750490:274:World Next Top Model MALTA]\nPhotography by #[445645795469650:274:Pixbymax Photography]"
The 303925999750490 is the fanpage ID. The World Next Top Model MALTA is the name of fanpage. (Don't know what the 274 means)
When you render this on your page, you can render like this:
Event organised by World Next Top Model MALTA
Photography by Pixbymax Photography

Related

How can I add directions using Google Map URLs from an array?

I'm using a WordPress directory theme: there is an address with street, number and city.
I get values from an array convert them into a google map location link, (a button that says "Open in Google map"). I want to change this to open at google map and route the plan from my location.
I think that $url has the address and is added to the google map url, but should I add more so that it will also route the plan from my location?
if( !function_exists('estate_listing_address') ):
function estate_listing_address($post_id,$col=3){
$property_address = esc_html( get_post_meta($post_id, 'property_address', true) );
$property_city = strip_tags ( get_the_term_list($post_id, 'property_city', '', ', ', '') );
$url = urlencode($property_address.','.$property_city);
$google_map_url = "http://maps.google.com/?q=".$url;
$return_string.= ' '.__('Route plan','wpestate').'';
return $return_string;
}
endif;
I think the question is kind of off since you can already use the values from the array. I can understand your concern like this, you want to change the request from showing the location to getting the route/directions.
So here's what you need to do if that's the case:
To get a route between 2 points, you need an origin and a destination. You can just store them in a variable like this:
$origin = "Chicago";
$destination = "Indianapolis";
I used Chicago here as a sample origin and Indianapolis as sample destination. But in for your case, you must change the origin to your location and the destination to the value from the array.
Now we urlencode the origin and destination to convert them to a valid url format in case there are special characters and store them in $url variable just like you did:
$url = urlencode('&origin='.$origin.'&destination='.$destination);
But this time, the origin and destination are set by using them as parameters.
And lastly for the google maps url, you can just put https://www.google.com/maps/dir/?api=1 in your url string like this then add the $url at the end:
$google_map_url = "https://www.google.com/maps/dir/?api=1".$url;
So without the variables, the request is basically like this:
https://www.google.com/maps/dir/?api=1&origin=Chicago&destination=Indianapolis
You may read more in the official documentation:
https://developers.google.com/maps/documentation/urls/guide#directions-action
Please be advised that abusing this functionality can be a violation to Google Maps Terms of Service. It is in the Prohobited Conduct section of TOS that When using Google Maps/Google Earth, you may not (or allow those acting on your behalf to):
a. redistribute or sell any part of Google Maps/Google Earth or create a new product or service based on Google Maps/Google Earth (unless you use the Google Maps/Google Earth APIs in accordance with their terms of service);
b. copy the Content (unless you are otherwise permitted to do so by the Using Google Maps, Google Earth, and Street View permissions page or applicable intellectual property law, including "fair use");
and
d. use Google Maps/Google Earth to create or augment any other mapping-related dataset (including a mapping or navigation dataset, business listings database, mailing list, or telemarketing list) for use in a service that is a substitute for, or a substantially similar service to, Google Maps/Google Earth
Read more about Google Maps TOS
here.
I would suggest using the Google Maps Javascript Directions API for applications like this instead of using the maps.google.com URL request. (In fact, it is much cooler to show the map in your page instead of navigating to a new page or popping up a new tab just to get directions)
Hope this helps!!

How to get book title from ISBN with Knowledge Graph?

I volunteer in a communal library and I'm in charge of the digital transition.
I'm using the free and open-source software PMB and I want to automate the retrieval of book titles with the Knowledge Graph API (which is not possible with PMB, or I missed something).
Why to use Knowledge Graph instead of ISBNdb or another free ISBN API ? Because none is as complete and qualitative as KG.
For example: I take the ISBN of a French book : 9782884613736 ("Le foot illustré de A à Z").
Not found on ISBNdb.com, etc.
So, on when I google it, the Knowledge Graph returns me exactly what I want :
> Screenshot of what I see
But when i'm using the API :
GET https://kgsearch.googleapis.com/v1/entities:search?languages=fr&query=9782884613736&types=Book&key={YOUR_API_KEY}
{
"#context": {
"#vocab": "http://schema.org/",
"goog": "http://schema.googleapis.com/",
"EntitySearchResult": "goog:EntitySearchResult",
"detailedDescription": "goog:detailedDescription",
"kg": "http://g.co/kg"
},
"#type": "ItemList",
"itemListElement": [
]
}
Nothing returned to my GET request. (It works properly if I request the book title, it returns well the informations)
I tried with different types according to schema.org : Book, BookSeries, BookFormatType.
Is there a way to use KG API as I want ?
I'm totally open to all suggestions (even to use another method to reach my aim).
Thank you.
The ISBN seems to be wrong or doesn't exits on google DB, here's a sample using google books api:
ISBN_10 : 2884610154
https://www.googleapis.com/books/v1/volumes?languages=fr&q=isbn:2884610154
ISBN_13 : 9782884610155
https://www.googleapis.com/books/v1/volumes?languages=fr&q=isbn:9782884610155

How to get the entire YouTube Video description, php, gdata

I have php code that correctly retrieves, using the YouTube api, the title, video url, viewcount, video date, last comment date, and the first 160 characters of the description. I can't seem to figure out how to get the entire description. I know it is there in the xml retrieved, because I have dumped that. So how come I am only getting 160 chars?
The entire description is truncated at 157 chars, and "..." is added, so that by the time I echo it or var_dump it, it is 160 chars. Here is my complete test code (without title, video url, etc etc).
<?php
$feedURL = 'http://gdata.youtube.com/feeds/api/videos?q=phone&v=2&fields=entry[yt:statistics/#viewCount > 10000]&start-index=1&max-results=1';
$sxml = simplexml_load_file($feedURL);
foreach ($sxml->entry as $entry) {
$media = $entry->children('http://search.yahoo.com/mrss/');
echo $media->group->description;
}
?>
This is what displays on the page:
FREE TuTiTu's Games: http://www.tutitu.tv/index.php/games FREE TuTiTu's Coloring pages at: http://www.tutitu.tv/index.php/coloring Join us on Facebook: https...
When I get the xml this way:
gdata.youtube.com/feeds/api/videos/JI-5kh_4gO0?v=2&alt=json-in-script&callback=youtubeFeedCallback&prettyprint=true
The entire description looks like this:
"media$description": {
"$t": "FREE TuTiTu's Games: http://www.tutitu.tv/index.php/games\nFREE TuTiTu's Coloring pages at: http://www.tutitu.tv/index.php/coloring\nJoin us on Facebook: https://www.facebook.com/TuTiTuTV\nTuTiTu's T-Shirts: http://www.zazzle.com/TuTiTu?rf=238778092083495163\n\nTuTiTu - The toys come to life\n\nTuTiTu - \"The toys come to life\" is a 3D animated television show targeting 2-3 year olds. Through colorful shapes TuTiTu will stimulate the children's imagination and creativity. On each episode TuTiTu's shapes will transform into a new and exciting toy.",
"type": "plain"
},
I'm sure I am missing something basic, but when I've looked for a solution, I have not found it.
Thanks for any help.
These 2 different types of API requests will return a different description size.
I assume it's a way to limit the total response size.
1) doing a search as in: http://gdata.youtube.com/feeds/api/videos?q=phone&v=2&fields=entry&alt=json&prettyprint=true will return the short video description.
2) doing a video request as in: http://gdata.youtube.com/feeds/api/videos/JI-5kh_4gO0?v=2&alt=json&prettyprint=true will return the long video description.
BTW: api version 3 will allow you to request a list of video id's in 1 request (to get their long descriptions).
$media->group->{'media$description'} should do the trick

Scraping HN Front Page - Handeling Simple HTML Dom Error

I'm using 'Simple HTML Dom' to scrape the HN Front Page (news.ycombinator.com), which works great most of the time.
However, every now and then they promote a job/company that lacks the elements that the scraper is looking for, i.e. score, username and number of comments.
This of course, breaks the array and thus the output of my script:
<?php
// 2012-02-12 Maximilian (Extract news.ycombinator.com's Front Page)
// Set the header during development
//header ("content-type: text/xml");
// Call the external PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/manual.htm)
include('lib/simple_html_dom.php');
date_default_timezone_set('Europe/Berlin');
// Download 'news.ycombinator.com' content
//$tmp = file_get_contents('http://news.ycombinator.com');
//file_put_contents('get.tmp', $tmp);
// Retrieve the content
$html = file_get_html('tc.tmp');
// Set the extraction pattern for each item
$title = $html->find("tr td table tr td.title a");
$score = $html->find("tr td.subtext span");
$user = $html->find("tr td.subtext a[href^=user]");
$link = $html->find("tr td table tr td.title a");
$time = $html->find("tr td.subtext");
$additionals = $html->find("tr td.subtext a[href^=item?id]");
// Construct the feed by looping through the items
for($i=0;$i<29;$i++) {
$cr=1;
// Check if the item points to an external website
if (!strstr($link[$i]->href,'http')) {
$url = 'http://news.ycombinator.com/'.$link[$i]->href;
$description = "Join the discussion on Hacker News.";
} else {
$url = $link[$i]->href;
// Getting content here
if (empty($abstract)) {
$description ="Failed to load any relevant content. Please try again later.";
} else {
$description = $abstract;
}
}
// Put all the items together
$result .= '<item><id>f'.$i.'</id><title>'.htmlspecialchars(trim($title[$i]->plaintext)).'</title><description><![CDATA['.$description.']]></description><pubDate>'.str_replace(' | '.$additionals[$i]->plaintext,'',str_replace($score[$i]->plaintext.' by '.$user[$i]->plaintext.' ','',$time[$i]->plaintext)).'</pubDate><score>'.$score[$i]->plaintext.'</score><user>'.$user[$i]->plaintext.'</user><comments>'.$additionals[$i]->plaintext.'</comments><id>'.substr($additionals[$i]->href,8).'</id><discussion>http://news.ycombinator.com/'.$additionals[$i]->href.'</discussion><link>'.htmlspecialchars($url).'</link></item>';
}
$output = '<rss><channel><id>news.ycombinator.com Frontpage</id><buildDate>'.date('Y-m-d H:i:s').'</buildDate>'.$result.'</channel></rss>';
file_put_contents('tc.xml', $output);
?>
Here's an example of the correct output
<item>
<id>f0</id>
<title>Show HN: Bootswatch, free swatches for your Bootstrap site</title>
<description><![CDATA[Easy to Install Simply download the CSS file from the swatch of your choice and replace the one in Bootstrap. No messing around with hex values. Whole New Feel We've all been there with the black bar and blue buttons. See how a splash of color and typography can transform the feel of your site. Modular Changes are contained in just two LESS files, enabling modification and ensuring forward compatibility.]]></description>
<pubDate>3 hours ago</pubDate>
<score>196 points</score>
<user>parkov</user>
<comments>30 comments</comments>
<id>3594540</id>
<discussion>http://news.ycombinator.com/item?id=3594540</discussion>
<link>http://bootswatch.com</link>
</item>
<item>
<id>f1</id>
<title>Louis CK inspires Jim Gaffigan to sell comedy special for $5 online</title>
<description><![CDATA[Dear Internet Friends,Inspired by the brilliant Louis CK, I have decided to debut my all-new hour stand-up special on my website, Jimgaffigan.com.Beginning sometime in April, “Jim Gaffigan: Mr. Universe” will be available exclusively for download for only $5. A dollar from each download will go directly to The Bob Woodruff Foundation; a charity dedicated to serving injured Veterans and their families.I am confident that the low price of my new comedy special and the fact that 20% of each $5 download will be donated to this very noble cause will prevent people from stealing it. Maybe I’m being naïve, but I trust you guys.]]></description>
<pubDate>57 minutes ago</pubDate>
<score>25 points</score>
<user>rkudeshi</user>
<comments>4 comments</comments>
<id>3595285</id>
<discussion>http://news.ycombinator.com/item?id=3595285</discussion>
<link>http://www.whosay.com/jimgaffigan/content/218011</link>
</item>
And here's an example of incorrect output. Note that the elements are not empty, thus I cannot seem to catch the error and simply jump to the next item. Everything past the promotion post will break:
<item>
<id>f14</id>
<title>Build the next Legos: We're hiring an iOS Developer & Web Developer (YC S11)</title>
<description><![CDATA[Interested in building the next generation of toys on digital devices such as the iPad? That’s what we’re doing here at Launchpad Toys with apps like Toontastic (Named one of the “Top 10 iPad Apps of 2011” by the New York Times and was recently added to the iTunes Hall of Fame) and an awesom]]><![CDATA[e suite of others we have under development. We’re looking for creative and playful coders that have made games or highly visual apps/sites in the past for our two open development positions. As a kid, you probably played with Legos endlessly and grew up to be a hacker because you still love building things. Sounds like you? Email us at howdy#launchpadtoys.com with a couple links to some projects and code that we can look at along with your resume.]]></description>
<pubDate>2 hours ago</pubDate>
<score>14 points</score>
<user>bproper</user>
<comments>7 comments</comments>
<id>3594944</id>
<discussion>http://news.ycombinator.com/item?id=3594944</discussion>
<link>http://launchpadtoys.com/blog/2012/02/iosdeveloper-webdeveloper/</link>
</item>
<item>
<id>f15</id>
<title>SOPA foe Fred Wilson supports a blacklist on pirate sites</title>
<description><![CDATA[VC Fred Wilson says Google, Bing, Facebook, and Twitter should warn people when they try to log in at known pirate sites: "We don't need legislation." Fred Wilson says: If they try to pass antipiracy legislation, it will once again be 'war.' (Credit: Greg Sandoval/CNET) Fred Wilson, a well-known ven]]><![CDATA[ture capitalist from New York, says he's in favor of creating a blacklist for Web sites found to traffic in pirated films, music, and other intellectual property. The co-founder of Union Square Ventures told a gathering of media executives at the Paley Center for Media yesterday that he believes a good antipiracy measure would be for Google, Twitter, Facebook, and other major sites to issue warnings to people when they try to connect with a known pirate site. Fred Wilson, a co-founder of Union Square Ventures, says 'Our children have been taught to steal.' (Credit: Union Square Ventures) Wilson favors establishing an independent group to create a "black and white list." "The blacklist are those sites we all know are bad news," he told the audience in New York.]]></description>
<pubDate>14 points by bproper 2 hours ago | 7 comments</pubDate>
<score>24 points</score>
<user>andrewcross</user>
<comments>12 comments</comments>
<id>3594558</id>
<discussion>http://news.ycombinator.com/item?id=3594558</discussion>
<link>http://news.cnet.com/8301-31001_3-57377862-261/post-sopa-influential-tech-investor-favors-blacklisting-pirate-sites/</link>
</item>
So here's my question: How can I handle a situation where a particular element is missing and find() doesn't throw an error? Do I have to start from scratch, or is there a better approach in scraping the HN front page?
For anyone curious, here's the whole XML file: http://thequeue.org/api/tc.xml
You have to work by chunks in order to handle that, there seems to be a dummy spacer element that can help you with that:
$news = preg_split('/<tr style="height:5px"><\/tr>/',$html->find('tbody',2)->innertext);
And then use subselectors:
foreach($news as $article){
$article = str_get_html($article)
// No upvote arrow found so its not a valid article
if(count($article->find('img')) === 0){
continue;
}
}
And for the other elements you use the same selectors
We'll thanks to Ivan's trail of thought, I am now splitting the initially scraped HTML into an array, each node representing a post. Then, going through every single post in a loop, I'll check if the up voting arrow image exists. If not, I'll not add it to the result. In the end everything will be stitched back together and the sponsored post is left out. Here's the code:
$array = explode('<tr style="height:5px"></tr>',$html);
foreach ($array as $post) {
if (!strstr($post,'grayarrow.gif')){}else{
$clean .= $post;
}
}
unset($array);
$html = str_get_html($clean.'</body></html>');

PHP Markdown tagging last chunk of content as h3

I'm using PHP Markdown (version 1.0.1n, updated October 2009) to display text saved to a database in markdown format. I'm running into a strange issue where it's tagging the last chunk of every entry as an H3. When I search the markdown.php file, though, there isn't a single instance of H3.
Here are two pieces of text from my database:
Since its launch, major CPG brands, endemic as well as non-endemic, have flocked to retail websites to reach consumers deep in the purchase funnel through shopping media. In this session, you will hear about:
- The prioritization of shopping media for CPG brands.
- A case study of brands on Target.com on how this retailer (and others) have introduced a new channel for brand marketers to engage consumers where they are making the majority of purchase decisions: online.
- How CPG brands are leveraging real-time data from shopping media to capture consumer insights and market trends.
In this one, it is tagging the LI items correctly, but inside the final LI it's tagging the actual text as H3.
Beyond the actual money she saves, this consumer is both empowered and psychologically gratified by getting the best value on her everyday purchases. It is essential for both marketers and retailers to focus on what motivates and activates this consumer.
Diane Oshin will share insights on what influences her shopping behavior and then identify specific tools that activate her to buy.
In this one, the entire paragraph starting with Diane Oshin is tagged as an H3.
Here's the really odd thing: when I do a view source, both of them are tagged correctly; it's only when using Inspect Element that I see the H3. However, it's obvious in the actual display that the H3 tag is being applied:
example 1
example 2
Can anyone help me out?
update
Per a comment below, I looked for instances of H tags. I found these functions, but don't know if this is what could be causing the issue or not. They are the only place in the entire file that appears to be creating a header tag of any kind.
function doHeaders($text) {
# Setext-style headers:
# Header 1
# ========
#
# Header 2
# --------
#
$text = preg_replace_callback('{ ^(.+?)[ ]*\n(=+|-+)[ ]*\n+ }mx',
array(&$this, '_doHeaders_callback_setext'), $text);
# atx-style headers:
# # Header 1
# ## Header 2
# ## Header 2 with closing hashes ##
# ...
# ###### Header 6
#
$text = preg_replace_callback('{
^(\#{1,6}) # $1 = string of #\'s
[ ]*
(.+?) # $2 = Header text
[ ]*
\#* # optional closing #\'s (not counted)
\n+
}xm',
array(&$this, '_doHeaders_callback_atx'), $text);
return $text;
}
function _doHeaders_callback_setext($matches) {
# Terrible hack to check we haven't found an empty list item.
if ($matches[2] == '-' && preg_match('{^-(?: |$)}', $matches[1]))
return $matches[0];
$level = $matches[2]{0} == '=' ? 1 : 2;
$block = "<h$level>".$this->runSpanGamut($matches[1])."</h$level>";
return "\n" . $this->hashBlock($block) . "\n\n";
}
function _doHeaders_callback_atx($matches) {
$level = strlen($matches[1]);
$block = "<h$level>".$this->runSpanGamut($matches[2])."</h$level>";
return "\n" . $this->hashBlock($block) . "\n\n";
}
I could not reproduce what you describe with the version you've been given:
<?php
include(__DIR__.'/php-markdown/markdown.php');
$testText = 'Since its launch, major CPG brands, endemic as well as non-endemic, have flocked to retail websites to reach consumers deep in the purchase funnel through shopping media. In this session, you will hear about:
- The prioritization of shopping media for CPG brands.
- A case study of brands on Target.com on how this retailer (and others) have introduced a new channel for brand marketers to engage consumers where they are making the majority of purchase decisions: online.
- How CPG brands are leveraging real-time data from shopping media to capture consumer insights and market trends.
';
$resultText = Markdown($testText);
var_dump($resultText);
The output looks fairly as you might expect it
string(649) "<p>Since its launch, major CPG brands, endemic as well as non-endemic, have flocked to retail websites to reach consumers deep in the purchase funnel through shopping media. In this session, you will hear about:</p>
<ul>
<li><p>The prioritization of shopping media for CPG brands.</p></li>
<li><p>A case study of brands on Target.com on how this retailer (and others) have introduced a new channel for brand marketers to engage consumers where they are making the majority of purchase decisions: online.</p></li>
<li><p>How CPG brands are leveraging real-time data from shopping media to capture consumer insights and market trends.</p></li>
</ul>
"
I assume something else tampering the data before it get's into the markdown parser or afterwards. But based on the data, the markdown parser does not create the <h3> tags. You must look somewhere else :(

Categories