Wikipedia API - get full information from infobox - php

i am trying to extract the parent company information (in infobox pane) for a page such as "KFC".
If you access the
http://en.wikipedia.org/wiki/KFC
url... the info box contains the property (Parent = Yum! Brands)
.. howver, when i access through the PHP API.. the parent info is not included.
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=KFC&rvsection=0
How do i ensure that Wikipedia API returns the "Parent = " information as well (for a brand term like "KFC"). Essentially, I want to extract info that Yum Brands is the parent of KFC through the wikipedia API.
Thanks!

Take a look at the wikipedia wiki official ways of getting informations.
My suggestion would be to use the screen scraping throught PHP Simple HTML DOM Parser which will always be the best, even if it's deprecated. The only downside is that if Wikipedia changes how it looks like you will have to update your code.
A guide to PHP Simple HTML DOM Parser.
Edit:
At least i'm doing something instead of linking to non working resources and downvoting right answers ...
Here's the code I made to get the Parent company information from the Infobox pane with the PHP Simple HTML DOM Parser.
<?php
//The folder where you uploaded simple_html_dom.php
require_once('/homepages/../htdocs/simple_html_dom.php');
//Wikipedia page to parse
$html = file_get_html('http://en.wikipedia.org/wiki/KFC');
foreach ( $html->find ( 'tr th a[title=Holding company]' ) as $element ) {
$element = $element->parent;
$element = $element->parent;
$tabella = $element->find ( 'td', 0 );
//Now $parent contains "Yum! Brands"
$parent = $tabella->plaintext;
echo $parent;
}
?>
If this answer suit your needs please choose it as best answer and upvote it because it took me a lot of effort, about 1 hour =/
Thanks ;)

Related

PHP check if XML node exists before saving to variable?

Here is a snippet of the xml I am working with:
My example xml
A client requested that we add the ability to filter which type of "news articles" are displayed on specific pages. They create these articles on another website, where they now have the ability to assign a one or more categories to each of the articles. We load the articles via php and xml.
The error I receive is:
Call to a member function getElementsByTagName() on null in ...
Here is the code from 2012 that I am working with:
$item = $dom_object->getElementsByTagName("Releases");
foreach( $item as $value )
{
$Release = $value->getElementsByTagName("Release");
foreach($Release as $ind_Release){
$Title = $ind_Release->getElementsByTagName("Title");
$PublishDateUtc = $ind_Release->getAttribute('PublishDateUtc');
$DetailUrl = $ind_Release->getAttribute('DetailUrl');
$parts = explode('/', $DetailUrl);
$last = end($parts);
I am trying to transverse to the category code and set a variable with:
$newsCategory = $ind_Release->getElementsByTagName("Categories")->item(0)->getElementsByTagName("Category")->item(0)->getElementsByTagName("Code")->item(0)->nodeValue;
This loads the current 2018 articles with the category slug being echoed, because they have an assigned category, but it fails to load 2017, 2016, and so on, I believe, because they are not assigned a category within the XML and this is breaking something.
A news article without a category appears with an empty categories node within XML
I understand that I am using getElementsByTagName, and because there is no element beyond the first categories node it breaks.
Is there a way to check that there is indeed a path to Categories->Category->Code[CDATA] before trying to set it as a variable and breaking it?
I apologize if this is confusing, I am not a PHP expert and could use all the help I can get. Is there a better way to transverse to the needed node?
Thanks.
You need to use XPath. If you're using DOMDocument, this is done via DOMXpath.
Your current approach uses chaining, and the problem with chaining is it breaks down if a particular juncture of it doesn't return what the following method relies on. Hence your error.
Instead, check the whole path from the start:
$domxp = new DOMXpath($dom_object);
$node = $domxp->query('/Categories[1]/Category[1]/Code[1]');
if (count($node)) {
//found - do something
}

Specification of mark-up format included in facebook open graph text

When I am performing Open Graph requests, some of the responses that I am expecting to be text are having some kind of markup included. For example, when I am requesting the Name and Description of an album, in the description I get something like \u0040[12412421421:124:The Link]. (The \u0040 is actually the # sign.)
In this case it seems that what it is saying is that the 'The Link' should be a hyperlink to a facebook page with ID 12412421421. I presume there is similar kind of markup for hashtags and external URLs.
I am trying to find some official documentation or description for this, but I can't seem to find any documentation of this (I might be looking with the wrong keywords).
Is there any online documentation that describes this? And better still is there an PHP library or function already available somewhere that converts this text into its HTML equivalent?
I am using this Facebook PHP SDK, but it doesn't seem to offer any such function. (Not sure if there is anything in the new version 4.0 one but I can't use it anyway for now because it requres PHP 5.4+ and my host currently is still on 5.3.).
It's true that the PHP SDK doesn't provide anything to deal with these links and the documentation doesn't document that either. However the API gives all the information you need in the description field itself, so here is what you could do:
$description = "Live concert with #[66961492640:274:Moonbootica] "
. "in #[106078429431815:274:London, United Kingdom]! #music #house";
function get_html_description($description) {
return
// 1. Handle tags (pages, people, etc.)
preg_replace_callback("/#\[([0-9]*):([0-9]*):(.*?)\]/", function($match) {
return ''.$match[3].'';
},
// 2. Handle hashtags
preg_replace_callback("/#(\w+)/", function($match) {
return ''.$match[0].'';
},
// 3. Handle breaklines
str_replace("\n", "<br />", $description)));
}
// Display HTML
echo get_html_description($description);
While 2. and 3. handle hashtags and breaklines, the part 1. of the code basically splits up the tag #[ID:TYPE:NAME] into 3 groups of information (id, type, name) before generating HTML links from the page IDs and names:
Live concert with Moonbootica in London, United Kingdom! #music #house
Live concert with Moonbootica in London, United Kingdom!
#music #house
FYI and even if it's not much useful, here are the meanings of the types:
an app (128),
a page (274),
a user (2048).
the # describes a tag to someone,facebook id doesnt make difference between a fanpage or a single person so you gotta deal with php only.and the # should be the only char that describes a person/page tagged
The markup is used to reference a fanpage.
Example:
"description": "Event organised by #[303925999750490:274:World Next Top Model MALTA]\nPhotography by #[445645795469650:274:Pixbymax Photography]"
The 303925999750490 is the fanpage ID. The World Next Top Model MALTA is the name of fanpage. (Don't know what the 274 means)
When you render this on your page, you can render like this:
Event organised by World Next Top Model MALTA
Photography by Pixbymax Photography

create force directed layout gexf file with node position generation

In my Database i've got nodes and edges.
Position must have generated at gexf generation, the node must not overlap and
generate this kind of Graph: http://www.nwoods.com/components/images/force-directed-layout.png
I use Sigma.js for presentating the graph.
How can i calculate the node postition with an force directed alogrithm, with a root item?
Or there a layout that can generate from node and edges a layout that not overlap and output like this png above?
EDIT:
PHP code for generating node position:
function _generate_gexf_node($test=false){
$count = 0;
foreach ($node_array as $node) {
$node_size = '22.714287';
$node_poz = ' x="'.rand(10, 300).'" y="'.rand(10, 300).'" z="'.rand(10, 300).'" ';
$node_color = ' b="45" g="72" r="216" ';
$data['node'][] = '<node id="'.$node['node_id'].'" label="'.$node['label'].'">
<attvalues>
<attvalue for="authority" value="0.01880342"/>
<attvalue for="hub" value="0.01880342"/>
</attvalues>
<viz:size value="'.$node_size.'"/>
<viz:color '.$node_color.'/>
<viz:position '.$node_poz.'/>
</node>
';
$count++;
}
return $data;
}
How can i gnerete the position in "runtime", to like this http://www.nwoods.com/components/images/force-directed-layout.png?
Where can i find implementation of YiFan Hu Force directed algorithm?
Or a special Bubble layout implementation?
at the moment i work also on a Web Graph Visualization but with javascript.
Here exists a good project called gexf-js (https://github.com/raphv/gexf-js)
but this is only for drawing the graph.
For the layout stuff i used the gephi library (this is implemented in java)
you can download it here https://gephi.org/toolkit/. And i found also a good example
for the YifanHu layout Algorithm.
If it it could a manual task than you can import your gexf file in the gephi program
runt the layouter you want and export it as gexf. Most of all function in the gephi program
are also available in the toolkit library.
Hopefully this helps
nice day

How can I get an id from a simplepie post which can be used to look it up later?

I've recently started developing a portfolio website which I would like to link to my wordpress blog using simplepie. It's been quite a smooth process so far - loading names and descriptions of posts, and linking them to the full post was quite easy. However, I would like the option to render the posts in my own website as well. Getting the full content of a given post is simple, but what I would like to do is provide a list of recent posts which link to a php page on my portfolio website that takes a GET variable of some sort to identify the post, so that I can render the full content there.
That's where I've run into problems - there doesn't seem to be any way to look up a post according to a specific id or name or similar. Is there any way I can pull some unique identifier from a post object on one page, then pass the identifier to another page and look up the specific post there? If that's impossible, is there any way for me to simply pass the entire post object, or temporarily store it somewhere so it can be used by the other page?
Thank you for your time.
I stumbled across your question looking for something else about simplepie. But I do work with an identifier while using simplepie. So this seems to be the answer to your question:
My getFeedPosts-function in PHP looks like this:
public function getFeedPosts($numberPosts = null) {
$feed = new SimplePie(); // default options
$feed->set_feed_url('http://yourname.blogspot.com'); // Set the feed
$feed->enable_cache(true); /* Enable caching */
$feed->set_cache_duration(1800); /* seconds to cache the feed */
$feed->init(); // Run SimplePie.
$feed->handle_content_type();
$allFeeds = array();
$number = $numberPosts>0 ? $numberPosts : 0;
foreach ($feed->get_items(0, $number) as $item) {
$singleFeed = array(
'author'=>$item->get_author(),
'categories'=>$item->get_categories(),
'copyright'=>$item->get_copyright(),
'content'=>$item->get_content(),
'date'=>$item->get_date("d.m.Y H:i"),
'description'=>$item->get_description(),
'id'=>$item->get_id(),
'latitude'=>$item->get_latitude(),
'longitude'=>$item->get_longitude(),
'permalink'=>$item->get_permalink(),
'title'=>$item->get_title()
);
array_push($allFeeds, $singleFeed);
}
$feed = null;
return json_encode($allFeeds);
}
As you can see, I build a associative array and return it as JSON what makes it really easy using jQuery and ajax (in my case) on the client side.
The 'id' is a unique identifier of every post in my blog. So this is the key to identify the same post also in another function/on another page. You just have to iterate the posts and compare this id. As far as I can see, there is no get_item($ID)-function. There is an get_item($key)-function but it is also just taking out a specific post from the list of all posts by the array-position (which is nearly the same way I suggest).

How do I set up an RSS feed for my hard-coded php & mysql site?

I have absolutely no idea how to start one. Every tutorial I find assumes I have a cms or blog of some sort. Mine's not exactly. I upload everything and coded all my css, html, mysql, php, and such. So how do I create an RSS feed?
I'm guessing I need to use a php include right?
Also I want my RSS feed to be automated if possible. Like all it'll need to know is the title of my page, and then the RSS will send it out to all my subscribers with the link of the page as the only description.
Please post any info you have though, as beggars can't be choosers.
Thanks!
Generate a list of filenames, order them by timestamp, read them, extract title and content snippets, and finally print out an RSS document. Example:
// list + sort
$files = glob("pages/*.html");
$files = array_combine($files, array_map("filemtime", $files));
arsort($files);
// loop + read
foreach ($files as $fn=>$mtime) {
$html = file_get_contents($fn);
preg_match('#<title>([^<]+)', $html, $title) and $title=$title[1];
$rss[] = array(
"link" => $fn,
"pubDate" => $mtime,
"title" => $title,
"description" => substr(strip_tags($html), 0, 100),
);
}
// write RSS
foreach ($rss ...)
Manually create a file containing the RSS XML referencing the pages from your site that you want in your feed. As you add new pages to your site, update that RSS file. The file should be stored along with other files comprising your site.
See the example on Wikipedia for the format: http://en.wikipedia.org/wiki/RSS#Example
Read up on RSS (http://www.w3schools.com/rss/default.asp). you don't have to send anything out; just update the RSS feed, and if they are subscribed the change will propagate through to the end-user. This can either be a semi-automated process that pulls in information as you update your page (why tutorials presuppose a blog or cms), or you can update the feed manually.

Categories