Simple XML - Dealing With Colons In Nodes - php

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.

The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}

With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}

You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?

An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

Related

PHP not returning full XML file contents [duplicate]

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.
The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}
With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}
You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?
An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

Check if child exists? - SimpleXML (PHP)

I have different XML files where I renamed for each XML file all individual tags, so that every XML file has the same tag name. That was easy because the function was customized for the XML file.
But instand of writing 7 new functions for each XML file now I want to check if a XML file has a specidifed child or not. Because if I want to say:
foreach ($items as $item) {
$node = dom_import_simplexml($item);
$title = $node->getElementsByTagName('title')->item(0)->textContent;
$price = $node->getElementsByTagName('price')->item(0)->textContent;
$url = $node->getElementsByTagName('url')->item(0)->textContent;
$publisher = $node->getElementsByTagName('publisher')->item(0)->textContent;
$category = $node->getElementsByTagName('category')->item(0)->textContent;
$platform = $node->getElementsByTagName('platform')->item(0)->textContent;
}
I get sometimes: PHP Notice: Trying to get property of non-object in ...
For example. Two different XML sheets. One contains publisher, category and platform, the other not:
XML 1:
<products>
<product>
<desc>This is a Test</desc>
<price>11.69</price>
<price_base>12.99</price_base>
<publisher>Stackoverflow</publisher>
<category>PHP</category>
</packshot>
<title>Check if child exists? - SimpleXML (PHP)</title>
<url>http://stackoverflow.com/questions/ask</url>
</product>
</products>
XML 2:
<products>
<product>
<image></image>
<title>Questions</title>
<price>23,90</price>
<url>google.de/url>
<platform>Stackoverflow</platform>
</product>
</products>
You see, sometimes one XML file contains publisher, category and platform but sometimes not. But it could also be that not every node of a XML file contains all attributes like in the first!
So I need to check for every node of a XML file individual if the node is containing publisher, category or/and platform.
How can I do that with SimpleXML?
I thought about switch case but at first I need to check which childs are contained in every node.
EDIT:
Maybe I found a solution. Is that a solution or not?
if($node->getElementsByTagName('platform')->item(0)){
echo $node->getElementsByTagName('platform')->item(0)->textContent . "\n";
}
Greetings and Thank You!
One way to rome... (working example)
$xml = "<products>
<product>
<desc>This is a Test</desc>
<price>11.69</price>
<price_base>12.99</price_base>
<publisher>Stackoverflow</publisher>
<category>PHP</category>
<title>Check if child exists? - SimpleXML (PHP)</title>
<url>http://stackoverflow.com/questions/ask</url>
</product>
</products>";
$xml = simplexml_load_string($xml);
#set fields to look for
foreach(['desc','title','price','publisher','category','platform','image','whatever'] as $path){
#get the first node
$result = $xml->xpath("product/{$path}[1]");
#validate and set
$coll[$path] = $result?(string)$result[0]:null;
#if you need here a local variable do (2 x $)
${$path} = $coll[$path];
}
#here i do array_filter() to remove all NULL entries
print_r(array_filter($coll));
#if local variables needed do
extract($coll);#this creates $desc, $price
Note </packshot> is an invalid node, removed here.
xpath syntax https://www.w3schools.com/xmL/xpath_syntax.asp
Firstly, you're over-complicating your code by switching from SimpleXML to DOM with dom_import_simplexml. The things you're doing with DOM can be done in much shorter code with SimpleXML.
Instead of this:
$node = dom_import_simplexml($item);
$title = $node->getElementsByTagName('title')->item(0)->textContent;
you can just use:
$title = (string)$item->title[0];
or even just:
$title = (string)$item->title;
To understand why this works, take a look at the SimpleXML examples in the manual.
Armed with that knowledge, you'll be amazed at how simple it is to see if a child exists or not:
if ( isset($item->title) ) {
$title = (string)$item->title;
} else {
echo "There is no title!";
}

PHP SimpleXmlElement get XML tag with colon in it? [duplicate]

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.
The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}
With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}
You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?
An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

Get child elements in xml with PHP

I have an xml file that I need to parse through and get values. Below is a snippit of xml
<?xml version="1.0"?>
<mobile>
<userInfo>
</userInfo>
<CATALOG>
<s0>
<SUB0>
<DESCR>Paranormal Studies</DESCR>
<SUBJECT>147</SUBJECT>
</SUB0>
</s0>
<sA>
<SUB0>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCT</SUBJECT>
</SUB0>
<SUB1>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCTG</SUBJECT>
</SUB1>
<SUB2>
<DESCR>Anatomy</DESCR>
<SUBJECT>ANATOMY</SUBJECT>
</SUB2>
<SUB3>
<DESCR>Anthropology</DESCR>
<SUBJECT>ANTHRO</SUBJECT>
</SUB3>
<SUB4>
<DESCR>Art</DESCR>
<SUBJECT>ART</SUBJECT>
</SUB4>
<SUB5>
<DESCR>Art History</DESCR>
<SUBJECT>ARTHIST</SUBJECT>
</SUB5>
</sA>
So, I need to grab all the child elements of <sA> and then there are more elements called <sB> etc
But I do not know how to get all of the child elements with <sA>, <sB>, etc.
How about this:
$xmlstr = LoadTheXMLFromSomewhere();
$xml = new simplexml_load_string($xmlstr);
$result = $xml->xpath('//sA');
foreach ($result as $node){
//do something with node
}
PHP does have a nice class to access XML, which is called SimpleXml for a reason, consider heavily using that if your code is going to access only a part of the XML (aka query the xml). Also, consider doing queries using XPath, which is the best way to do it
Notice that I did the example with sA nodes only, but you can configure your code for other node types really easily.
Hope I can help!
you should look into simplexml_load_string() as I'm pretty sure it would make your life a lot easier. It returns a StdObject that you can use like so:
$xml = simplexml_load_string(<your huge xml string>);
foreach ($xml->hpt_mobile->CATALOG->sA as $value){
// do things with sA children
}
$xml = new DOMDocument();
$xml->load('path_to_xml');
$htp = $xml->getElementsByTagName('hpt_mobile')[0];
$catalog = $htp->getElementsByTagName('CATALOG')[0]
$nodes = $catalog->getElementsByTagName('sA')->childNodes;

PHP DOMDocument getting Attribute of Tag

Hello I have an api response in xml format with a series of items such as this:
<item>
<title>blah balh</title>
<pubDate>Tue, 20 Oct 2009 </pubDate>
<media:file date="today" data="example text string"/>
</item>
I want to use DOMDocument to get the attribute "data" from the tag "media:file". My attempt below doesn't work:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('media:file');
$linkthumb = $nodes->item(0)->getAttribute('data');
}
What am I doing wrong? Please help.
EDIT: I can't leave comments for some reason Mark. I get the error
Call to a member function getAttribute() on a non-object
when I run my code. I have also tried
$nodes = $feeditem->getElementsByTagNameNS('uri','file');
$linkthumb = $nodes->item(0)->getAttribute('data');
where uri is the uri relating to the media name space(NS) but again the same problem.
Note that the media element is of the form not I think this is part of the problem, as I generally have no issue parsing for attibutes.
The example you provided should not generate an error. I tested it and $linkthumb contained the string "example text string" as expected
Ensure the media namespace is defined in the returned XML otherwise DOMDocument will error out.
If you are getting a specific error, please edit your post to include it
Edit:
Try the following code:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('file');
$linkthumb = $nodes->item(0)->getAttribute('data');
echo $linkthumb;
}
You may also want to look at SimpleXML and Xpath as it makes reading XML much easier than DOMDocument.
Alternatively,
$DOMNode -> attributes -> getNamedItem( 'MyAttribute' ) -> value;

Categories