PHP SimpleXML read comment with xPath - php

I'm loading a XML Feed, which works.
But it seems that I am missing something in my xpath query.
Actually I want to read the comment of the title node but it doesn't seem to work.
$xml = simplexml_load_file( $feed_url );
$comment = $xml->xpath('//channel/item[1]/title//comment()');
The feed has the following structure
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>My main feed</title>
<description>feed description</description>
<link>http://www.example.com</link>
<language>en_US</language>
<lastBuildDate>Wed, 26 Nov 2014 11:44:13 UTC</lastBuildDate>
<item>
<title><!-- Special Image --></title>
<link></link>
<guid>http://www.example.com/page/345</guid>
<media:category>horizontal</media:category>
<media:thumbnails>
<media:thumbnail url="www.example.com/test.jpg" type="image/jpeg" height="324" width="545" />
</media:thumbnails>
</item>
<item>
<title>Here's a normal title</title>
<description><![CDATA[Description Text]]></description>
<link></link>
<guid>http://www.example.com/page/123</guid>
<media:category>horizontal</media:category>
</item>
</channel>
</rss>
Does anyone have a clue how I could read the comment?

Alternatively, you could use DOMDocument to access those comments. Example:
$dom = new DOMDocument();
$dom->load($feed_url);
$xpath = new DOMXpath($dom);
$comment = $xpath->evaluate('string(//channel/item[1]/title/comment())');
echo $comment;

Related

How can I retrieve specific XML tag names?

The feed above returns an XML document. I can successfully retrieve tag names like title,description and link using these codes
$xml = file_get_contents($feed_url);
$xml = trim($xml);
$xmlObject = new SimpleXmlElement($xml);
foreach ($xmlObject->channel->item as $item) {
$title = strip_tags($item->title);
$description = strip_tags($item->description);
}
How can I get <a10:updated> ?
<rss xmlns:a10="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>title/title>
<link>link</link>
<description>news</description>
<item>
<guid isPermaLink="true">link</guid>
<link>link</link>
<title>Tiele</title>
<description>Descr</description>
<enclosure url="image" type="image/jpeg"/>
<a10:updated>2017-05-07T09:14:00+03:00</a10:updated>
</item>
</channel>
</rss>
Here we are using DOMDocument for extracting data from a tag.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$xml='<rss xmlns:a10="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>title</title>
<link>link</link>
<description>news</description>
<item>
<guid isPermaLink="true">link</guid>
<link>link</link>
<title>Tiele</title>
<description>Descr</description>
<enclosure url="image" type="image/jpeg"/>
<a10:updated>2017-05-07T09:14:00+03:00</a10:updated>
</item>
</channel>
</rss>';
$xmlObject = new DOMDocument();
$xmlObject->loadXML($xml);
$result=$xmlObject->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "*");
print_r($result->item(0)->textContent);
Output:
2017-05-07T09:14:00+03:00
You're looking at a different XML namespace there. You can use curly brackets to access it:
$a10 = $item->{'a10:updated'}

Symfony 2 test xml with Symfony\Component\DomCrawler\Crawler

I've got an url that return an xml but I have some problem to extract "link" element.
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<item>
<id>123</id>
<title>my title</title>
<link>
http://example.org
</link>
</item>
</channel>
</rss>
I need to test it with
Symfony\Component\DomCrawler\Crawler
These are my tests:
$crawler = $this->client->get('/my-feed');
$items = $crawler->filterXPath('//channel/item');
$this->assertGreaterThanOrEqual(1, $items->count()); // ok pass
// ...
$titles = $items->filterXPath('//title')->extract(array('_text'));
$this->assertContains("my title", $titles); // ok pass
// ...
$links = $items->filterXPath('//link')->extract(array('_text'));
$this->assertContains("example.org", $links); // KO!!! don't pass
var_dump($links); // empty string
"link" is a reserved word?
Your XML is broken:
you don't have a closing channel node </channel>
you don't have a closing rss node </rss>
Here is corrected XML :
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<item>
<id>123</id>
<title>my title</title>
<link>http://example.org</link>
</item>
</channel>
</rss>
Then, ->extract() returns An array of extracted values. So you shouldn't directly try to see its contain but get the first element and do your test:
$this->assertContains("my title", $titles[0]);
// ...
$this->assertContains("example.org", $links[0]);

Adding version to a Generated XML tag

Im trying to make an RSS Feed XML and i spotted a site where they give an example of how the XML should look like :
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>The Channel Title Goes Here</title>
<description>The explanation of how the items are related goes here</description>
<link>http://www.directoryoflinksgohere</link>
<item>
<title>The Title Goes Here</title>
<description>The description goes here</description>
<link>http://www.linkgoeshere.com</link>
</item>
<item>
<title>Another Title Goes Here</title>
<description>Another description goes here</description>
<link>http://www.anotherlinkgoeshere.com</link>
</item>
</channel>
</rss>
However when i check my current XML i notice i miss the version in the xml and rss tag.
<xml>
<rss>
<channel>
<title>#####</title>
<description>
#####
</description>
<path>#####</path>
How can i add the version to the start tag of XML and RSS?
PHP
$newspages = $this->newspages;
$xml = new SimpleXMLElement('<xml/>');
$rss = $xml->addChild('rss');
$channel = $rss->addChild('channel');
$channel->addChild('title', txt('rss.channelname'));
$channel->addChild('description', txt('rss.channeldescription'));
$channel->addChild('path', 'http://'.$_SERVER['HTTP_HOST']);
foreach ($newspages as $newspage) {
if ($newspage['id'] !== 'news-archive') {
$item = $channel->addChild('item');
$item->addChild('title', $newspage['title']);
$item->addChild('description', $newspage['description']);
$item->addChild('path', url('###/pageid', array('language'=>$this->language, 'id'=>$newspage['id'])));
}
}
Header('Content-type: text/xml');
print($xml->asXML());
Use addAttribute method.
<?php
$xml = new SimpleXMLElement('<xml/>');
$xml->addAttribute('version', '1.0');
$rss = $xml->addChild('rss');
$rss->addAttribute('version', '2.0');

Modify XML via PHP and output xml (almost like a filter a feed)

In PHP, I have to filter this rss feed and output new rss feed.
Read the feed, see where the dc:creator== badcreatorname and remove the item and put the feed in the same order.
How do I do that? Please help.
<?xml version="1.0"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>LInk.com title</title>
<link>http://link.com</link>
<description>Link.com</description>
<image>
<url>http://www.link.com/media/feedlogo.gif</url>
<title>link.com</title>
<link>http://link.com</link>
</image>
<language>en-us</language>
<copyright>Copyright 2012 </copyright>
<generator>Generator</generator>
<item>
<title><![CDATA[Title goes here]]></title>
<link><![CDATA[http://link.com/]]></link>
<guid isPermaLink="true"><![CDATA[http://link.com/]]></guid>
<description><![CDATA[ Desc]]></description>
<dc:creator><![CDATA[badcreatorname]]></dc:creator>
<pubDate>Mon, 17 Sep 2012 15:16:00 EST</pubDate>
<dc:identifier>898980</dc:identifier>
<category domain="category"><![CDATA[cat]]></category>
<category domain="blogger:name"><![CDATA[cat]]></category>
<enclosure url="pic" length="" type="image/jpeg"/>
</item>
<item>
<title><![CDATA[Title1 goes here]]></title>
<link><![CDATA[http://link1.com/]]></link>
<guid isPermaLink="true"><![CDATA[http://link1.com/]]></guid>
<description><![CDATA[ Desc1]]></description>
<dc:creator><![CDATA[goodcreatorname]]></dc:creator>
<pubDate>Mon, 17 Sep 2012 15:16:00 EST</pubDate>
<dc:identifier>8989801</dc:identifier>
<category domain="category"><![CDATA[cat]]></category>
<category domain="blogger:name"><![CDATA[cat]]></category>
<enclosure url="pic" length="" type="image/jpeg"/>
</item>
</channel>
</rss>
<?php
// Load our XML document
$doc = new DOMDocument();
$doc->load('feed.xml');
// Create an XPath object and register our namespaces so we can
// find the nodes that we want
$xpath = new DOMXPath($doc);
$xpath->registerNamespace('p', 'http://purl.org/dc/elements/1.1/');
// Find the <item> nodes that have a badcreatorname in there
$nodes = $xpath->query("/rss/channel/item[p:creator/text()[ . = 'badcreatorname' ]]");
// Remove the offending nodes from the DOM
for ($i = 0; $i < $nodes->length; $i++) {
$node = $nodes->item($i);
$node->parentNode->removeChild($node);
}
// Write our updated XML back to a new file
$doc->save('feedout.xml');
// Or if you want to send the XML content to stdout:
echo $doc->saveXML();
?>

Accessing a single XML DOM Document node

I am completely new to DOM Documents, basically what I am trying to do, is to load a RSS feed in and select only one node, and then save it to a XML file.
Here is the XML I am loading from a web feed:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Markets</title>
<description/>
<link>http://www.website.com</link>
<language>en-us</language>
<copyright>XML Output Copyright</copyright>
<ttl>15</ttl>
<pubDate>Tue, 16 Nov 2010 09:38:00 +0000</pubDate>
<webMaster>admin#website.com</webMaster>
<image>
<title>title</title>
<url>http://www.website.com/images/xmllogo.gif</url>
<link>http://www.website.com</link>
<width>144</width>
<height>16</height>
</image>
<item>
<title>title</title>
<description>the description goes here
</description>
<enclosure url="http://www.website.com/images/image.png" type="image/png"/>
</item>
</channel>
</rss>
Here is my lame attempt at getting the <description> node and saving it to feed.xml:
<?php
$feed = new DOMDocument();
$feed->load('http://www.website.com/directory/directory/cz.c');
$nodeValue = $feed->getElementsByTagName('description')->item(0)->nodeValue;
$feed->save("feed.xml");
?>
So basically I need to get the description tag, and save it as a XML file.
Any help would be appreciated, thanx in advance!
Almost correct. To get the "outerXml" of a node, you can pass the node to saveXml()
$feed = new DOMDocument();
$feed->load('http://www.website.com/directory/directory/cz.c');
$xml = $feed->saveXml($feed->getElementsByTagName('description')->item(0));
file_put_contents("feed.xml", $xml);
Saving with file_put_contents will not include an XML prolog. Note that in your example, the first description element is empty, so the file will contain <description/>.
If you want to extract the node as standalone XML Document, you have to instantiate a new DOMDocument and import the DOMNode and then use save().
$dom = new DOMDocument($feed->xmlVersion, $feed->xmlEncoding);
$dom->appendChild(
$dom->importNode(
$feed->getElementsByTagName('description')->item(0),
TRUE
)
);
echo $dom->save('new.xml');
$feed = simplexml_load_file('feed.xml');
$descr=$feed->channel->description;
Try this

Categories