Parsing WordPress XML, slash:comments syntax?

Parsing WordPress XML, slash:comments syntax? - php

This is really just a syntax question.
I have a PHP script that parses my WordPress feed and returns the latest posts. I also want my script to parse the # of comments, but the WordPress feed XML object for number of comments has a colon in it (slash:comments). It causes the following error:
Parse error: syntax error, unexpected
':' in ... on line ...
I have tried each of the following without luck:
$xml->slash:comments
$comments = 'slash:comments'
$xml->$comments
$xml->slash.':'.comments
$xml->{slash:comments}
$xml->{'slash:comments'}
How do I parse an object with a colon?

Alternatively, you can use xpath() to access the nodes. Given the following as an xml string:
<entry>
<id>http://gdata.youtube.com/feeds/api/videos/xyz12345678</id>
<published>2007-01-17T23:41:00.000Z</published>
<updated>2010-11-14T03:52:25.000Z</updated>
<yt:location>Mount Washington Observatory, NH</yt:location>
<media:group>
<media:title type='plain'>Example of a Title</media:title>
<media:duration seconds='126'/>
</media:group>
</entry>
You could do this:
$xml = simplexml_load_string(*xmlstring_from_above*);
$location = $xml->xpath('yt:location');
echo($location[0]); // output: "Mount Washington Observatory, NH"
$title = $xml->xpath('media:group/media:title');
echo($title[0]); // output: "Example of a Title"
$duration = $xml->xpath('media:group/media:duration');
echo($duration[0]['seconds']); // output: "126"
As you can see, to get the nodes with colons, you may use xpath() with a relative path to the node.

A variable in PHP can never have a colon in it. Therefore, you should check your XML parser to see how it handles colons.

$string = file_get_contents("http://domain.tld/?feed=rss2");
$string = str_replace('slash:comments','slashcomments',$string);
$xml = simplexml_load_string($string);
Use str_replace to remove the colons from the string and allow simplexml_load_string to function as normal.
For example:
$string = file_get_contents("http://domain.tld/?feed=rss2");
$string = str_replace('slash:comments','slashcomments',$string);
$xml = simplexml_load_string($string);
foreach ($xml->channel->item as $val) {
echo $val->pubDate.'<br />';
echo $val->title.'<br />';
echo $val->slashcomments.'<br /><br />';
}
... should return the published date, title, and number of comments of the posts listed in a WordPress feed. My code is more advanced, but this illustrates the workaround.
Thank you, Arda Xi, for your help!

Related

PHP not returning full XML file contents [duplicate]

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.

The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}

With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}

You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?

An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

PHP SimpleXmlElement get XML tag with colon in it? [duplicate]

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.

The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}

With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}

You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?

An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

How to fix XML parsing error with PHP? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
PHP library for parsing XML with a colons in tag names?
I have the xml shown below and I want to parse out the product title. When I use the php code below, I get "Parse error: syntax error, unexpected ':' in /home/content/c/a/s/cashme/html/buylooper/xml.php on line 5" because of the ":" located in the tag. How do I resolve this?
*update: I've got the answer to the first part, but am having trouble in how to parse out an attribute of an xml tag. The tag I am having trouble with is the "s:image" tag (link attribute) inside the "s:images" tag.
<?php
$url = 'xml-file.xml';
$xml = simplexml_load_file($url);
$title = $xml->entry[0]->s:product->s:title;
//print
echo '<br/>';
echo $title;
?>
<entry gd:kind="shopping#product">
<s:product>
<s:googleId>9400569674928563633</s:googleId>
<s:author>
<s:name>Amazon.com</s:name>
<s:accountId>2860562</s:accountId>
</s:author>
<s:creationTime>2010-08-19T05:50:21.000Z</s:creationTime>
<s:modificationTime>2012-01-26T23:54:26.000Z</s:modificationTime>
<s:country>US</s:country>
<s:language>en</s:language>
<s:title>Canon powershot s95 10 mp digital camera with 3.8x wide angle optical image stabilized zoom and 3.0-inch lcd</s:title>
<s:description>desc</s:description>
<s:link>http://www.amazon.com/Canon-PowerShot-S95-Stabilized-3-0-Inch/dp/B003ZSHNGS</s:link>
<s:brand>Canon</s:brand>
<s:condition>new</s:condition>
<s:gtin>00013803126556</s:gtin>
<s:gtins>
<s:gtin>00013803126556</s:gtin>
</s:gtins>
<s:inventories>
<s:inventory channel="online" availability="inStock">
<s:price shipping="0.0" currency="USD">340.41</s:price>
</s:inventory>
</s:inventories>
<s:images>
<s:image link="http://ecx.images-amazon.com/images/I/519z3AjKzHL._SL500_AA300_.jpg"/>
</s:images>
</s:product>
</entry>

Parse the namespaces first.
$namespaces = $xml->getNameSpaces(true);
$s = $xml->children($namespaces['s']);
echo (string)$s->product->title. "\n";
echo (string)$s->product->images->image->attributes()->link;

You need to get the correct namespace with . This is untested, but might do the trick:
$url = 'xml-file.xml';
$xml = simplexml_load_file($url);
$namespaces = $xml->entry->getNameSpaces(true);
// Get children of the correct namespace
$s = $xml->entry[0]->children($namespaces['s']);
$title = $s->product->title;
//print
echo '<br/>';
echo $title;

PHP DOMDocument getting Attribute of Tag

Hello I have an api response in xml format with a series of items such as this:
<item>
<title>blah balh</title>
<pubDate>Tue, 20 Oct 2009 </pubDate>
<media:file date="today" data="example text string"/>
</item>
I want to use DOMDocument to get the attribute "data" from the tag "media:file". My attempt below doesn't work:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('media:file');
$linkthumb = $nodes->item(0)->getAttribute('data');
}
What am I doing wrong? Please help.
EDIT: I can't leave comments for some reason Mark. I get the error
Call to a member function getAttribute() on a non-object
when I run my code. I have also tried
$nodes = $feeditem->getElementsByTagNameNS('uri','file');
$linkthumb = $nodes->item(0)->getAttribute('data');
where uri is the uri relating to the media name space(NS) but again the same problem.
Note that the media element is of the form not I think this is part of the problem, as I generally have no issue parsing for attibutes.

The example you provided should not generate an error. I tested it and $linkthumb contained the string "example text string" as expected
Ensure the media namespace is defined in the returned XML otherwise DOMDocument will error out.
If you are getting a specific error, please edit your post to include it
Edit:
Try the following code:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('file');
$linkthumb = $nodes->item(0)->getAttribute('data');
echo $linkthumb;
}
You may also want to look at SimpleXML and Xpath as it makes reading XML much easier than DOMDocument.

Alternatively,
$DOMNode -> attributes -> getNamedItem( 'MyAttribute' ) -> value;

Simple XML - Dealing With Colons In Nodes

I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.

The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}

With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}

You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?

An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing WordPress XML, slash:comments syntax? - php

A variable in PHP can never have a colon in it. Therefore, you should check your XML parser to see how it handles colons.

Related

PHP not returning full XML file contents [duplicate]

PHP SimpleXmlElement get XML tag with colon in it? [duplicate]

How to fix XML parsing error with PHP? [duplicate]

PHP DOMDocument getting Attribute of Tag

Simple XML - Dealing With Colons In Nodes

Categories

Resources