PHP traversing dbpedia xml

PHP traversing dbpedia xml - php

I need to traverse a dbpedia's xml resource file to get the abstract and some other basic information like formation year and budget.
An example for this would be the US EPA.(the bottom of the page has links to different data formats of the same file)
I only need the first rdf:Description namespace of the xml file. A snippet of the code
$xml_result = file_get_contents($xml_url);
$xml_data = simplexml_load_string($xml_result);
$namespaces = $xml_data->getNamespaces(true);
//print_r($namespaces);
$current = $xml_data->children($namespaces['rdf']);
This only gets me the rdf elements inside the first rdf:Description. how do i get access to other elements like the dbpedia-owl namespace elements inside the Description element ?

You can use multiple namespaces, see https://stackoverflow.com/a/13350242/865201
Without testing it, I think you can use something like
$xml_data->children($namespaces['rdf'])->Description->children($namespaces['dbpedia-owl'])->anotherElement;

Related

attributes in root cause xml file to not load by simplexml php

So, this is a follow-up question to my previous question that was solved, here's the link to it:
using data from child element to select data in other element using simplexml in php
thanks to #RomanPerekhrest for solving this.
I have this piece of php code using simplexml to read my xml file
<?php
$xml = simplexml_load_file('../uploads/reports/report.xml');
$hits = $xml->xpath("results/hits/#rule_id");
$ruleIds = array_map(function($v){ // getting search path for each needed rule
return "profile_info/rules/rule[#id='". (string)$v. "']";
}, $hits);
foreach ($xml->xpath(implode(" | ", $ruleIds)) as $rule) {
echo '<div id="name">'. $rule->display_name .'</div>'.
'<div id="comment">'. $rule->display_comment .'</div>';
}
?>
again, thanks to #RomanPerekhrest for coming up with this.
This piece of code works fine with my simplified xml-file I created to illustrate my problems in my previous questions, but when I apply it, it doesn't seem to render.
I've found the reason why, in my root element there are some xmlns attributes that cause my xml not to load. When I manually remove these attributes, everything works as expected. (I will not the list the entire xml document, since it is 8500+ lines long)
Here is the root element with the attributes:
<report xsi:schemaLocation="http://www.callassoftware.com/namespace/pi4 pi4_results_schema.xsd" xmlns="http://www.callassoftware.com/namespace/pi4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
I need a way to bypass in php. Seeing as these xml files are generated by other software and the lack of settings in this generation, I cannot simply make these xml files without these attributes.
Thanks

Your XML has default namespace declared at the root element, which descendant elements without prefix inherit implicitly :
xmlns="http://www.callassoftware.com/namespace/pi4"
To reference element in default namespace, you need to map a prefix to the default namespace URI, and then use that prefix in your XPath :
//register prefix 'd' to reference default namespace URI
$xml->registerXPathNamespace('d', 'http://www.callassoftware.com/namespace/pi4');
//use the prefix to reference elements in the default namespace
$hits = $xml->xpath("d:results/d:hits/#rule_id");
$ruleIds = array_map(function($v){ // getting search path for each needed rule
return "d:profile_info/d:rules/d:rule[#id='". (string)$v. "']";
}, $hits);

PHP DOMDocument XML validation - default namespace - element not expected

I try to validate this document in PHP using DOMdocument's schemaValidate:
<?xml version="1.0" encoding="UTF-8"?> <works xmlns="http://pbn.nauka.gov.pl/-/ns/bibliography" pbn-unit-id="1388"><article><title>Mukowiscydoza</title></article></works>
by using $domDocument->schemaValidate('pbn-report.xsd')
Link to XSD:
https://pbn.nauka.gov.pl/help/images/files/pbn-report.xsd.zip
... and I always get an error
Error 1871: Element 'article': This element is not expected. Expected
is one of ( {http://pbn.nauka.gov.pl/-/ns/bibliography}article,
{http://pbn.nauka.gov.pl/-/ns/bibliography}book,
{http://pbn.nauka.gov.pl/-/ns/bibliography}chapter ). on line 0
For me it is incomprehensible. Why do I get an error when I pointed out the default namespace?

Solved.
It turns out that when you create a DOMDocument, when you add an Element every time you need to give Namespace. When generating a document (saveXML) will not make any difference, but if you run schemaValidate, the validator checks DOMDocument object, and not the generated XML.
In other words this code:
$domDocument = new DOMDocument('1.0', "UTF-8");
$domWorks = $domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'works');
$domWorksId = $domDocument->createAttribute('pbn-unit-id');
$domWorksId->value = PBNID;
$domWorks->appendChild($domWorksId);
$domDocument->appendChild($domWorks);
$domArticle = $domDocument->createElement('article');
$domArticle->appendChild($domDocument->createElement('title','Mukowiscydoza'));
$domWorks->appendChild($domArticle);
echo htmlentities($domDocument->saveXML());
generates the same XML as this code
$domDocument = new DOMDocument('1.0', "UTF-8");
$domWorks = $domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'works');
$domWorksId = $domDocument->createAttribute('pbn-unit-id');
$domWorksId->value = PBNID;
$domWorks->appendChild($domWorksId);
$domDocument->appendChild($domWorks);
$domArticle = $domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'article');
$domArticle->appendChild($domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'title','Mukowiscydoza'));
$domWorks->appendChild($domArticle);
echo htmlentities($domDocument->saveXML());
But if you check schema
$domDocument->schemaValidate('pbn-report.xsd');
, the first code will return an error.
Strange ...

Strange ...
Well not really. As long as the document is in memory, the information about the namespace(s) with the elements is preserved.
In that case the two different methods / parameter here really make a difference even if you don't see a difference in the generated XML (afterwards):
// null namespace
$domArticle = $domDocument->createElement('article');
// vs. concrete namespace
$domArticle = $domDocument->createElementNS(
'http://pbn.nauka.gov.pl/-/ns/bibliography', 'article'
);
You then serialize the document (what you describe as "generates the same XML") as XML and you then load that XML back into memory. Then the elements with no namespace aren't within the null namespace any longer because they inherit their namespace from their parent element.
So you must differ between the document and it's elements in memory (DOM) and in the serialized form (string, file).
You can have similar effects when you do XSLT transformations. So if you experience something strange, it's worth to consider that the document in memory is not representing what you first think even it creates similar - or even exact same - looking XML ;)

Try to put the xmlns inside the article element , then try again.
xmlns="http://pbn.nauka.gov.pl/-/ns/bibliography"

Can't access XML node via xpath() (YT channel feed)

Very stumped by this one. In PHP, I'm fetching a YouTube user's vids feed and trying to access the nodes, like so:
$url = 'http://gdata.youtube.com/feeds/api/users/HCAFCOfficial/uploads';
$xml = simplexml_load_file($url);
So far, so fine. Really basic stuff. I can see the data comes back by running:
echo '<p>Found '.count($xml->xpath('*')).' nodes.</p>'; //41
echo '<textarea>';print_r($xml);echo '</textarea>';
Both print what I would expect, and the print_r replicates the XML structure.
However, I have no idea why this is returning zero:
echo '<p>Found '.count($xml->xpath('entry')).'"entry" nodes.</p>';
There blatantly are entry nodes in the XML. This is confirmed by running:
foreach($xml->xpath('*') as $node) echo '<p>['.$node->getName().']</p>';
...which duly outputs "[entry]" 25 times. So perhaps this is a bug in SimpleXML? This is part of a wider feed caching system and I'm not having any trouble with other, non-YT feeds, only YT ones.
[UPDATE]
This question shows that it works if you do
count($xml->entry)
But I'm curious as to why count($xml->xpath('entry')) doesn't also work...
[Update 2]
I can happily traverse YT's anternate feed format just fine:
http://gdata.youtube.com/feeds/base/users/{user id}/uploads?alt=rss&v=2

This is happening because the feed is an Atom document with a defined default namespace.
<feed xmlns="http://www.w3.org/2005/Atom" ...
Since a namespace is defined, you have to define it for your xpath call too. Doing something like this works:
$url = 'http://gdata.youtube.com/feeds/api/users/HCAFCOfficial/uploads';
$xml = simplexml_load_file($url);
$xml->registerXPathNamespace('ns', 'http://www.w3.org/2005/Atom');
$results = $xml->xpath('ns:entry');
echo count($results);
The main thing to know here is that SimpleXML respects any and all defined namespaces and you need to handle them accordingly, including the default namespace. You'll notice that the second feed you listed does not define a default namespace and so the xpath call works fine as is.

Get XML Attribute with SimpleXML

I'm trying to get the $xml->entry->yt:statistics->attributes()->viewCount attribute, and I've tried some stuff with SimpleXML, and I can't really get it working!
Attempt #1
<?php
$xml = simplexml_load_file("http://gdata.youtube.com/feeds/api/videos?author=Google");
echo $xml->entry[0]->yt:statistics['viewCount'];
?>
Attempt #2
<?php
$xml = simplexml_load_file("http://gdata.youtube.com/feeds/api/videos?author=Google");
echo $xml->entry[0]->yt:statistics->attributes()->viewCount;
?>
Both of which return blank, though SimpleXML is working, I tried to get the feed's title, which worked!
Any ideas?
I've looked at loads of other examples on SO and other sites, but somehow this isn't working? does PHP recognize the ':' to be a cut-off, or am I just doing something stupid?
Thank you, any responses greatly appreciated!

If you just want to get the viewcount of a youtube video then you have to specify the video ID. The youtube ID is found in each video url. For example "http://www.youtube.com/watch?v=ccI-MugndOU" so the id is ccI-MugndOU. In order to get the viewcount then try the code below
$sample_video_ID = "ccI-MugndOU";
$JSON = file_get_contents("http://gdata.youtube.com/feeds/api/videos?q={$sample_video_ID}&alt=json");
$JSON_Data = json_decode($JSON);
$views = $JSON_Data->{'feed'}->{'entry'}[0]->{'yt$statistics'}->{'viewCount'};
echo $views;

I would use the gdata component from the zend framework. Is also available as a separate module, so you don't need to use the whole zend.

The yt: prefix marks that element as being in a different "XML namespace" from the rest of the document. You have to tell SimpleXML to switch to that namespace using the ->children() method.
The line you were attempting should actually look like this:
echo (string)$xml->entry[0]->children('yt', true)->statistics->attributes(NULL)->viewCount;
To break this down:
(string) - this is just a good habit: you want the string contents of the attribute, not a SimpleXML object representing it
$xml->entry[0] - as expected
->children('yt', true) - switch to the namespace with the local alias 'yt'
->statistics - as expected
->attributes(NULL) - technically, the attribute "viewCount" is back in the default namespace, because it is not prefixed with "yt:", so we have to switch back in order to see it
->viewCount - running ->attributes() gives us nothing but attributes, which are accessed with ->foo not ['foo']

Parsing XML (PHP)

I'm using SimpleXML . I want to get this node's text attribute.
<yweather:condition text="Mostly Cloudy" ......
I'm using this it's not working :
$xml->children("yweather", TRUE)->condition->attributes()->text;

Do a print_r() on $xml to see how the structure looks. From there you should be able to see how to access the information.

It looks like you are trying to access an attribute, which is stored in an array in $xml->yweather->attributes() so:
$attributes = $xml->condition->attributes();
$weather = $attributes['text'];
To deal with the namespace, you need to use children() to get the members of that namespace.
$weather_items = $xml->channel->item->children("http://xml.weather.yahoo.com/ns/rss/1.0");
It might help to mention that the string you showed is part of a feed, specifically the RSS formatted Yahoo Weather feed.

You would probably use $xml->condition but there may be branches before that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP traversing dbpedia xml - php

You can use multiple namespaces, see https://stackoverflow.com/a/13350242/865201 Without testing it, I think you can use something like $xml_data->children($namespaces['rdf'])->Description->children($namespaces['dbpedia-owl'])->anotherElement;

Related

attributes in root cause xml file to not load by simplexml php

PHP DOMDocument XML validation - default namespace - element not expected

Can't access XML node via xpath() (YT channel feed)

Get XML Attribute with SimpleXML

Parsing XML (PHP)

Categories

Resources