Problems on reading image url from a rss feed, using DOMDocument - php

I have a rss feed
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<item>
<title>VIDEO: Have you heard of Alibaba?</title>
<description>Alibaba is the world's biggest e-commerce firm but most people in the West haven't heard of it.</description>
<link>http://www.bbc.co.uk/news/business-29216696#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/business-29216696</guid>
<pubDate>Tue, 16 Sep 2014 02:29:17 GMT</pubDate>
<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/77609000/jpg/_77609399_73619721.jpg"/>
<media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/77609000/jpg/_77609400_73619721.jpg"/>
</item>
<item>
<title>VIDEO: Phones 4U shops closing for business</title>
<description>Retailer Phones 4U has gone into administration putting 5,596 jobs at risk.</description>
<link>http://www.bbc.co.uk/news/business-29202179#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/business-29202179</guid>
<pubDate>Mon, 15 Sep 2014 22:15:50 GMT</pubDate>
<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/77587000/jpg/_77587217_77587209.jpg"/>
<media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/77587000/jpg/_77587218_77587209.jpg"/>
</item>
</rss>
I am able to read title, description from this rss, using php's DOMDocument class.
Following is my code
$xml = 'http://feeds.bbci.co.uk/news/video_and_audio/business/rss.xml' ;
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$items=$xmlDoc->getElementsByTagName('item');
foreach($items as $item){
$item_title= $item->getElementsByTagName('title')->item(0)->childNodes->item(0)->nodeValue;
$item_link= $item->getElementsByTagName('link')->item(0)->childNodes->item(0)->nodeValue;
$item_desc= $item->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;
}
But how can able to read url of 'media:thumbnail' tag of each item ?

Since it has namespaces, use getElementsByTagNameNS() together with ->getAttribute() in this case. Example:
$xml = 'http://feeds.bbci.co.uk/news/video_and_audio/business/rss.xml' ;
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$items = $xmlDoc->getElementsByTagName('item');
foreach($items as $key => $item) {
$item_title= $item->getElementsByTagName('title')->item(0)->childNodes->item(0)->nodeValue;
$item_link= $item->getElementsByTagName('link')->item(0)->childNodes->item(0)->nodeValue;
$item_desc= $item->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;
$media = $item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'thumbnail');
foreach($media as $thumb) {
echo $thumb->getAttribute('url') . '<br/>';
}
}
SimpleXMLElement Variant:
$xml = simplexml_load_file('http://feeds.bbci.co.uk/news/video_and_audio/business/rss.xml');
foreach($xml->channel->item as $item) {
$title = $item->title;
$description = $item->description;
$link = $item->link;
$media = $item->children('media', 'http://search.yahoo.com/mrss/');
foreach($media->thumbnail as $thumb) {
echo $thumb->attributes()->url . '<br/>';
}
}

Use Xpath. It is part of the DOM extension and allows you to use expressions to fetch nodes and values from a DOM. Like XML itself Xpath allows you define prefixes/aliases for the namespaces.
$dom = new DOMDocument;
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('m', 'http://search.yahoo.com/mrss/');
$xpath->registerNamespace('a', 'http://www.w3.org/2005/Atom');
foreach ($xpath->evaluate('//item') as $itemNode) {
$item = [
'title' => $xpath->evaluate('string(title)', $itemNode),
'link' => $xpath->evaluate('string(link)', $itemNode),
'description' => $xpath->evaluate('string(description)', $itemNode),
];
foreach ($xpath->evaluate('m:thumbnail/#url', $itemNode) as $urlAttribute) {
$item['thumbnails'][] = $urlAttribute->value;
}
var_dump($item);
}

Related

Remove all nodes from XML but specific ones in PHP

I have an XML from Google with a content like this:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title>E-commerce's products.</title>
<description><![CDATA[Clothing and accessories.]]></description>
<link>https://www.ourwebsite.com/</link>
<item>
<title><![CDATA[Product #1 title]]></title>
<g:brand><![CDATA[Product #1 brand]]></g:brand>
<g:mpn><![CDATA[5643785645]]></g:mpn>
<g:gender>Male</g:gender>
<g:age_group>Adult</g:age_group>
<g:size>Unica</g:size>
<g:condition>new</g:condition>
<g:id>fr_30763_06352</g:id>
<g:item_group_id>fr_30763</g:item_group_id>
<link><![CDATA[https://www.ourwebsite.com/product_1_url.htm?mid=62367]]></link>
<description><![CDATA[Product #1 description]]></description>
<g:image_link><![CDATA[https://data.ourwebsite.com/imgprodotto/product-1_big.jpg]]></g:image_link>
<g:sale_price>29.25 EUR</g:sale_price>
<g:price>65.00 EUR</g:price>
<g:shipping_weight>0.5 kg</g:shipping_weight>
<g:featured_product>y</g:featured_product>
<g:product_type><![CDATA[Product #1 category]]></g:product_type>
<g:availability>in stock</g:availability>
<g:availability_date>2022-08-10T00:00-0000</g:availability_date>
<qty>3</qty>
<g:payment_accepted>Visa</g:payment_accepted>
<g:payment_accepted>MasterCard</g:payment_accepted>
<g:payment_accepted>CartaSi</g:payment_accepted>
<g:payment_accepted>Aura</g:payment_accepted>
<g:payment_accepted>PayPal</g:payment_accepted>
</item>
<item>
<title><![CDATA[Product #2 title]]></title>
<g:brand><![CDATA[Product #2 brand]]></g:brand>
<g:mpn><![CDATA[573489547859]]></g:mpn>
<g:gender>Unisex</g:gender>
<g:age_group>Adult</g:age_group>
<g:size>Unica</g:size>
<g:condition>new</g:condition>
<g:id>fr_47362_382936</g:id>
<g:item_group_id>fr_47362</g:item_group_id>
<link><![CDATA[https://www.ourwebsite.com/product_2_url.htm?mid=168192]]></link>
<description><![CDATA[Product #2 description]]></description>
<g:image_link><![CDATA[https://data.ourwebsite.com/imgprodotto/product-2_big.jpg]]></g:image_link>
<g:sale_price>143.91 EUR</g:sale_price>
<g:price>159.90 EUR</g:price>
<g:shipping_weight>8.0 kg</g:shipping_weight>
<g:product_type><![CDATA[Product #2 category]]></g:product_type>
<g:availability>in stock</g:availability>
<g:availability_date>2022-08-10T00:00-0000</g:availability_date>
<qty>1</qty>
<g:payment_accepted>Visa</g:payment_accepted>
<g:payment_accepted>MasterCard</g:payment_accepted>
<g:payment_accepted>CartaSi</g:payment_accepted>
<g:payment_accepted>Aura</g:payment_accepted>
<g:payment_accepted>PayPal</g:payment_accepted>
</item>
...
</channel>
</rss>
I need to produce a XML file purged from all the tags inside <item> except for <g:mpn>, <link>, <g:sale_price> and <qty>.
In the example above, the result should be
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title>E-commerce's products.</title>
<description><![CDATA[Clothing and accessories.]]></description>
<link>https://www.ourwebsite.com/</link>
<item>
<g:mpn><![CDATA[5643785645]]></g:mpn>
<link><![CDATA[https://www.ourwebsite.com/product_1_url.htm?mid=62367]]></link>
<g:sale_price>29.25 EUR</g:sale_price>
<qty>3</qty>
</item>
<item>
<g:mpn><![CDATA[573489547859]]></g:mpn>
<link><![CDATA[https://www.ourwebsite.com/product_2_url.htm?mid=168192]]></link>
<g:sale_price>143.91 EUR</g:sale_price>
<qty>1</qty>
</item>
...
</channel>
</rss>
I've looked at SimpleXML, DOMDocument, XPath docs but I couldn't find the way to exclude specific elements. I don't want to select by name the nodes I have to delete, as in a future Google could add some nodes and they will not be deleted by my script.
I've also tried to loop through namespaced elements with SimpleXML and unset them if not matched with the nodes I have to keep:
$g = $element->children($namespaces['g']); //$element is the SimpleXMLElement of <item> tag
foreach ($g as $gchild) {
if ($gchild->getName() != "mpn") { //for example
unset($gchild);
}
}
but the code above doesn't remove all nodes except for <g:mpn>, for example.
PS: consider the fact that the XML contains both namespaced and not namespaced elements
Thank you in advance.
EDIT:
I've managed to do this with the following code:
$elementsToKeep = array("mpn", "link", "sale_price", "qty");
$domdoc = new DOMDocument();
$domdoc->preserveWhiteSpace = FALSE;
$domdoc->formatOutput = TRUE;
$domdoc->loadXML($myXMLDocument->asXML()); //$myXMLDocument is the SimpleXML document related to the original XML
$xpath = new DOMXPath($domdoc);
foreach ($element->children() as $child) {
$cname = $child->getName();
if (!in_array($cname, $elementsToKeep)) {
foreach($xpath->query('/rss/channel/item/'.$cname) as $node) {
$node->parentNode->removeChild($node);
}
}
}
$g = $element->children($namespaces['g']);
foreach ($g as $gchild) {
$gname = $gchild->getName();
if (!in_array($gname, $elementsToKeep)) {
foreach($xpath->query('/rss/channel/item/g:'.$gname) as $node) {
$node->parentNode->removeChild($node);
}
}
}
I've used DOMDocument and DOMXPath and two loops on no-namespaced tags and namespaced tags, in order to use the removeChild function of DOMDocument.
Really there is not a cleaner solution?? Thanks again
Somewhat simpler:
$items = $xpath->query('//item');
foreach($items as $item) {
$targets = $xpath->query('.//*',$item);
foreach($targets as $target) {
if (!in_array($target->localName, $elementsToKeep)) {
$target->parentNode->removeChild($target);
}
};
};
Use XPath to express all child elements you want to remove.
Then use the library of your choice to remove the elements.
SimpleXMLElement example:
$sxe = simplexml_load_string($xml);
foreach ($sxe->xpath('//item/*[
not(
name() = "g:mpn"
or name() = "link"
or name() = "g:sale_price"
or name() = "qty"
)
]') as $child) unset($child[0]);
echo $sxe->asXML(), "\n";
DOMDocument example:
This one is mainly identical to the previous example, with a bit of a variation on the xpath expression to explicitly use namespace URIs for the elements. This prevents that it breaks when the namespace prefix changes (it also works in the SimpleXMLElement example):
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//item/*[
not(
(local-name() = "mpn" and namespace-uri() = "http://base.google.com/ns/1.0")
or (local-name() = "link" and namespace-uri() = "")
or (local-name() = "sale_price" and namespace-uri() = "http://base.google.com/ns/1.0")
or (local-name() = "qty" and namespace-uri() = "")
)
]') as $child) {
$child->parentNode->removeChild($child);
}
echo $doc->saveXML(), "\n";

PHP SimpleXML: How to access nested namespaces?

Given this XML structure:
$xml = '<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<item>
<title>Title</title>
<media:group>
<media:content url="url1" />
<media:content url="url2" />
</media:group>
</item>
<item>
<title>Title2</title>
<media:group>
<media:content url="url1" />
<media:content url="url2" />
</media:group>
</item>
</channel>
</rss>';
$xml_data = new SimpleXMLElement($xml);
How do I access the attributes of the media:content nodes? I tried
foreach ($xml_data->channel->item as $key => $data) {
$urls = $data->children('media', true)->children('media', true);
print_r($urls);
}
and
foreach ($xml_data->channel->item as $key => $data) {
$ns = $xml->getNamespaces(true);
$urls = $data->children('media', true)->children($ns['media']);
print_r($urls);
}
as per other answers, but they both return empty SimpleXMLElements.
When you echo out XML with SimpleXML, you need to use asXML() to see the real content, print_r() does it's own version and doesn't show all the content...
foreach ($xml_data->channel->item as $key => $data) {
$urls = $data->children('media', true)->children('media', true);
echo $urls->asXML().PHP_EOL;
}
echos out...
<media:content url="url1"/>
<media:content url="url1"/>
It only outputs the first one of each group as you will need to add another foreach to go through all of the child nodes for each element.
foreach ($xml_data->channel->item as $key => $data) {
echo $data->title.PHP_EOL;
foreach ( $data->children('media', true)->children('media', true) as $content ) {
echo $content->asXML().PHP_EOL;
}
}
outputs..
Title
<media:content url="url1"/>
<media:content url="url2"/>
Title2
<media:content url="url1"/>
<media:content url="url2"/>
To access a particular attribute (so for example the url attribute from the second code example) you have to use the attributes() method...
echo $content->attributes()['url'];

Parse feed media group with array of children

I have this XML feed:
<item>
<title>Title</title>
<media:group>
<media:content url="http://example.it/image.jpg" type="image/jpeg">
<media:thumbail url="http://example.it/image.jpg" type="image/png"/>
<media:credit>Credit</media:credit>
</media:content>
<media:content url="http://example.it/image2.jpg" type="image/jpeg">
<media:thumbail url="http://example.it/image2.jpg" type="image/png"/>
<media:credit>Credit2</media:credit>
</media:content>
</media:group>
</item>
This is my PHP code for read it:
$rss = new SimpleXMLElement($url);
foreach ($rss->channel->item as $item) {
$title = $item->title;
}
No problem reading "title" item, but how can I read "url", "thumbnail", "credit" for each media:content?
-------SOLVED-------
$rss = new SimpleXMLElement($url);
foreach ($rss->channel->item as $item) {
$title = $item->title;
$gallerie = $item->children('http://search.yahoo.com/mrss/')->group->content;
foreach($gallerie as $g) {
echo $g->attributes()['url'] ."<br/>";
}
}

PHP xml array parsing data

Hey all i have this type of XML i am trying to get data from. This is just a snip of the large XML code:
<entry>
<id>http://www.google.com/calendar/feeds/[Letters/numbers here]group.calendar.google.com/public/basic/[Letters/numbers here]</id>
<published>2013-08-01T13:40:24.000Z</published>
<updated>2013-08-01T13:40:24.000Z</updated>
<title type='html'>[Title Here]</title>
<summary type='html'>When: Tue Sep 24, 2013 7am</summary>
<content type='html'>When: Tue Sep 24, 2013 7am
<br />Event Status: confirmed
</content>
<link rel='alternate' type='text/html' href='https://www.google.com/calendar/event?eid=[Letters/numbers here]' title='alternate'/>
<link rel='self' type='application/atom+xml' href='https://www.google.com/calendar/feeds/[Letters/numbers here]group.calendar.google.com/public/basic/[Letters/numbers here]'/>
<author>
<name>[email here]</name>
<email>[email here]</email>
</author>
</entry>
etc... etc....
Currently i can get both published and updated just fine by doing the following:
<?php
$url = strtolower($_GET['url']);
$doc = new DOMDocument();
$doc->load('http://www.google.com/calendar/feeds/[number/letters here].calendar.google.com/public/basic');
$entries = $doc->getElementsByTagName("entry");
foreach ($entries as $entry) {
$tmpPublished = $entry->getElementsByTagName("published");
$published = $tmpPublished->item(0)->nodeValue;
$tmpUpdated = $entry->getElementsByTagName("updated");
$updated = $tmpUpdated->item(0)->nodeValue;
}
?>
However i am unsure as to how to get the inner data from within the parent array - that being link in this case.
So i need to get
link->href
I would imagine it would be:
$tmpLink = $entry->getElementsByTagName("link");
$link = $tmpLink->item( 2 )->nodeValue;
Any help would be great!
you can use:
$links = $doc->getElementsByTagName("link");
foreach ($links as $link) {
$href = $link->getAttribute("href");
}
if you want to get href... hope that I understood what you wanted :)
You can do this with simplexml_load_string like following codes:
$entries = simplexml_load_string($string);
foreach ($entries as $entry) {
echo $entry->published;
echo $entry->updated;
foreach($entry->link as $link)
{
echo $link->attributes()->type;
echo $link->attributes()->rel;
}
}

Reading itunes XML file with PHP DOM method

I'm having some trouble in getting information from my itunes XML feed which you can peek at here: http://c3carlingford.org.au/podcast/C3CiTunesFeed.xml
I need to get the information from each of the inner <item> tags. An example of one of these is as follows:
<item>
<title>What to do when a viper bites you</title>
<itunes:subtitle/>
<itunes:summary/>
<!-- 4000 Characters Max ******** -->
<itunes:author>Ps. Phil Buechler</itunes:author>
<itunes:image href="http://www.c3carlingford.org.au/podcast/itunes_cover_art.jpg"/>
<enclosure url="http://www.ccccarlingford.org.au/podcast/C3C-20120722PM.mp3" length="14158931" type="audio/mpeg"/>
<guid isPermaLink="false">61bc701c-b374-40ea-bc36-6c1cdaae8042</guid>
<pubDate>Sun, 22 Jul 2012 19:30:00 +1100</pubDate>
<itunes:duration>40:01</itunes:duration>
<itunes:keywords>
Worship, Reach, Build, Holy Spirit, Worship, C3 Carlingford
</itunes:keywords>
</item>
Now i have had some success!
I have been able to get the title out of it all:
<?php
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->load('http://c3carlingford.org.au/podcast/C3CiTunesFeed.xml');
$items = $dom->getElementsByTagName('item');
foreach($items as $item){
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
echo $title . '<br />';
};
?>
But I can't seem to get anything else out... I'm new to all this!
So What I need to get out includes:
The <itunes:author> value.
The url attribute value from the <enclosure> tag
Would someone help me getting these two values out?
You can use DOMXPath to do this and make your life a lot easier:
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML( $xml); // $xml = file_get_contents( "http://www.c3carlingford.org.au/podcast/C3CiTunesFeed.xml")
// Initialize XPath
$xpath = new DOMXpath( $doc);
// Register the itunes namespace
$xpath->registerNamespace( 'itunes', 'http://www.itunes.com/dtds/podcast-1.0.dtd');
$items = $doc->getElementsByTagName('item');
foreach( $items as $item) {
$title = $xpath->query( 'title', $item)->item(0)->nodeValue;
$author = $xpath->query( 'itunes:author', $item)->item(0)->nodeValue;
$enclosure = $xpath->query( 'enclosure', $item)->item(0);
$url = $enclosure->attributes->getNamedItem('url')->value;
echo "$title - $author - $url\n";
}
You can see from the demo that this will output:
What to do when a viper bites you - Ps. Phil Buechler - http://www.ccccarlingford.org.au/podcast/C3C-20120722PM.mp3
Yes, You can do it with simplexml.
Here is the sample code:
<?php
$x = simplexml_load_file("http://c3carlingford.org.au/podcast/C3CiTunesFeed.xml");
foreach ($x->channel->item as $item) {
$otherNode = $item->children('http://www.itunes.com/dtds/podcast-1.0.dtd');
echo $item->title .'---'.$otherNode->author;
echo "\n";
}
?>
OutPut:
What to do when a viper bites you---Ps. Phil Buechler
Living Water, Let the River Flow!---Ps. Phil Buechler
The Call of God to Forgive One Another AM & PM---Ps. Richard Botta
The Call of God to Evangelise AM & PM---Rob Waugh
The Call of God to Love One Another AM & PM---Rob Waugh
Hope this help!
you can use simpleXML children
$item->children('itunes', TRUE);
So you have an array with all the tag itunes:duration, itunes:subtitle ....
<?php
$x = simplexml_load_file("http://c3carlingford.org.au/podcast/C3CiTunesFeed.xml");
foreach ($x->channel->item as $item) {
$otherNode = $item->children('itunes', TRUE);
echo $otherNode->duration;
echo "\n";
echo $otherNode->author;
echo "\n";
echo $otherNode->subtitle;
echo "\n";
}
?>

Categories