PHP DOMDocument getting Attribute of Tag - php

Hello I have an api response in xml format with a series of items such as this:
<item>
<title>blah balh</title>
<pubDate>Tue, 20 Oct 2009 </pubDate>
<media:file date="today" data="example text string"/>
</item>
I want to use DOMDocument to get the attribute "data" from the tag "media:file". My attempt below doesn't work:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('media:file');
$linkthumb = $nodes->item(0)->getAttribute('data');
}
What am I doing wrong? Please help.
EDIT: I can't leave comments for some reason Mark. I get the error
Call to a member function getAttribute() on a non-object
when I run my code. I have also tried
$nodes = $feeditem->getElementsByTagNameNS('uri','file');
$linkthumb = $nodes->item(0)->getAttribute('data');
where uri is the uri relating to the media name space(NS) but again the same problem.
Note that the media element is of the form not I think this is part of the problem, as I generally have no issue parsing for attibutes.

The example you provided should not generate an error. I tested it and $linkthumb contained the string "example text string" as expected
Ensure the media namespace is defined in the returned XML otherwise DOMDocument will error out.
If you are getting a specific error, please edit your post to include it
Edit:
Try the following code:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('file');
$linkthumb = $nodes->item(0)->getAttribute('data');
echo $linkthumb;
}
You may also want to look at SimpleXML and Xpath as it makes reading XML much easier than DOMDocument.

Alternatively,
$DOMNode -> attributes -> getNamedItem( 'MyAttribute' ) -> value;

Related

PHP not returning full XML file contents [duplicate]

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.
The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}
With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}
You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?
An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

Is it possible to get an attribute's value and the text within a node at the same time in XPath 1.0?

I've tried the solution here: Getting attribute using XPath
but it gives me an error.
I have some XHTML like this:
Click me!
I'm recursively parsing the XML and trying to get both the href attribute (link.php) and the link text (Click me!) at the same time.
<?php
$node = $xpath->query('string(self::a/#href) | self::a/text()', $nodes->item(0));
This code throws the following error:
Warning: DOMXPath::query(): Invalid type
If I do either of these two separately they work, but not together:
<?php
$node = $xpath->evaluate('string(self::a/#href)', $nodes->item(0));
$node = $xpath->query('self::a/text()', $nodes->item(0));
If I use the following I get the whole attribute (href="link.php"), not just its value:
<?php
$node = $xpath->query('self::a/#href | self::a/text()', $nodes->item(0));
Is there any way of getting both text values at the same time using XPath 1.0 in PHP?
As suggested by others, you can use concat() (and PHP XPath supports it! see the demo below) to combine value of attribute and content of an element.
The problem with others' suggested XPath probably was, judging from your attempted code i.e the use of self::a, that the context node ($nodes->item(0)) is already the <a> element, so that a/#href relative to current context node means return href attribute of child element a of current element, that's why you got no match. You were correct by using self::a in this case or, alternatively, just . which can be used to reference current context node :
$doc = new DOMDocument();
$xml = <<<XML
<root>
Click me!
</root>
XML;
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//a');
$node = $xpath->evaluate('concat(#href, "|", .)', $nodes->item(0));
echo $node;
eval.in demo
output :
link.php|Click me!

PHP SimpleXmlElement get XML tag with colon in it? [duplicate]

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.
The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}
With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}
You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?
An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

XPath Substring-After Help / Query/Evaluate?

I'm building a php script to transfer selected contents of an xml file to an sql database..
One of the hardcoded XML contents is formatted like this:
<visualURL>
id=18144083|img=http://upload.wikimedia.org/wikipedia/en/8/86/Holyrollernovacaine.jpg
</visualURL>
And I'm looking for a way to just get the contents of the URL (all text after img=).
$Image = $xpath->query("substring-after(/Playlist/PlaylistEntry[1]/visualURL[1]/text(), 'img=')", $element)->item(0)->nodeValue;
Displays a property non-object error on my php output.
There must be another way to just extract the URL contents using XPath that I want, no?
Any help would be greatly appreciated!
EDIT:
Here is the minimum code
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->loadXML('<Playlist>
<PlaylistEntry>
<visualURL>
id=12582194|img=http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
</visualURL>
</PlaylistEntry>
</Playlist>');
$xpath = new DOMXpath($xmlDoc);
$elements = $xpath->query("/Playlist/PlaylistEntry[1]");
if (!is_null($elements))
foreach ($elements as $element)
$Image = $xpath->query("substring-after(/Playlist/PlaylistEntry[1]/visualURL[1]/text(), 'img=')", $element)- >item(0)->nodeValue;
print "Finished Item: $Image";
?>
EDIT 2:
After some research I believe I must use
$xpath->evaluate
instead of my current use of
$xpath->query
see this link
Same XPath query is working with Google docs but not PHP
I'm not exactly sure how to do this yet.. but i will investigate more in the morning. Again, any help would be appreciated.
You're in right direction. Use DOMXPath::evaluate() for xpath expression that doesn't return node(s) like substring-after() (it returns string as documented in the linked page). The following codes prints expected output :
$xmlDoc = new DOMDocument();
$xml = <<<XML
<Playlist>
<PlaylistEntry>
<visualURL>
id=12582194|img=http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
</visualURL>
</PlaylistEntry>
</Playlist>
XML;
$xmlDoc->loadXML($xml);
$xpath = new DOMXpath($xmlDoc);
$elements = $xpath->query("/Playlist/PlaylistEntry");
foreach ($elements as $element) {
$Image = $xpath->evaluate("substring-after(visualURL, 'img=')", $element);
print "Finished Item: $Image <br>";
}
output :
Finished Item: http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
Demo

can't access xml node PHP

I have a page in php where I have to parse an xml.
I have done this for example:
$hotelNodes = $xml_data->getElementsByTagName('Hotel');
foreach($hotelNodes as $hotel){
$supplementsNodes2 = $hotel->getElementsByTagName('BoardBase');
foreach($supplementsNodes2 as $suppl2) {
echo'<p>HERE</p>'; //not enter here
}
}
}
In this code I access to each hotel of my xml, and foreach hotel I would like to search the tag BoardBase but it doesn0t enter inside it.
This is my xml (cutted of many parts!!!!!)
<hotel desc="DESC" name="Hotel">
<selctedsupplements>
<boardbases>
<boardbase bbpublishprice="0" bbprice="0" bbname="Colazione Continentale" bbid="1"></boardbase>
</boardbases>
</selctedsupplements>
</occupancy></occupancies>
</hotel>
I have many nodes that doesn't have BoardBase but sometimes there is but not enter.
Is possible that this node isn't accessible?
This xml is received by a server with a SoapClient.
If I inspect the XML printed in firebug I can see the node with opacity like this:
I have also tried this:
$supplementsNodes2 = $hotel->getElementsByTagName('boardbase');
but without success
2 issues I can see from the get-go: XML names are case-sensitive, hence:
$hotelNodes = $xml_data->getElementsByTagName('Hotel');
Can't work, because your xml node looks like:
<hotel desc="DESC" name="Hotel">
hotel => lower-case!
As you can see here:
[...] names for such things as elements, while XML is explicitly case sensitive.
The official specs specify tag names as case-sensitive, so getElementsByTagName('FOO') won't return the same elements as getElementsByTagName('foo')...
Secondly, you seem to have some tag-soup going on:
</occupancy></occupancies>
<!-- tag names don't match, both are closing tags -->
This is just plain invalid markup, it should read either:
<occupancy></occupancy>
or
<occupancies></occupancies>
That would be the first 2 ports of call.
I've set up a quick codepad using this code, which you can see here:
$xml = '<hotel desc="DESC" name="Hotel">
<selctedsupplements>
<boardbases>
<boardbase bbpublishprice="0" bbprice="0" bbname="Colazione Continentale" bbid="1"></boardbase>
</boardbases>
</selctedsupplements>
<occupancy></occupancy>
</hotel>';
$dom = new DOMDocument;
$dom->loadXML($xml);
$badList = $dom->getElementsByTagName('Hotel');
$correctList = $dom->getElementsByTagName('hotel');
echo sprintf("%d",$badList->lenght),
' compared to ',
$correctList->length, PHP_EOL;
The output was "0 compared to 1", meaning that using a lower-case selector returned 1 element, the one with the upper-case H returned an empty list.
To get to the boardbase tags for each hotel tag, you just have to write this:
$hotels = $dom->getElementsByTagName('html');
foreach($hotels as $hotel)
{
$supplementsNodes2 = $hotel->getElementsByTagName('boardbase');
foreach($supplementsNodes2 as $node)
{
var_dump($node);//you _will_ get here now
}
}
As you can see on this updated codepad.
Alessandro, your XML is a mess (=un casino), you really need to get that straight. Elias' answer pointed out some very basic stuff to consider.
I built on the code pad Elias has been setting up, it is working perfectly with me:
$dom = new DOMDocument;
$dom->loadXML($xml);
$hotels = $dom->getElementsByTagName('hotel');
foreach ($hotels as $hotel) {
$bbs = $hotel->getElementsByTagName('boardbase');
foreach ($bbs as $bb) echo $bb->getAttribute('bbname');
}
see http://codepad.org/I6oxkEOC

Categories