Trying to Parse Images and Text from an RSS Feed - php

This is a continuation of the thread here: Trying to Parse Only the Images from an RSS Feed
This time I want to parse both Images and Certain Items from an RSS feed. A Sampling of the RSS feed looks like this:
<channel>
<atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />
<item>
<title>Article One</title>
<guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
<link>http://mywebsite.com/geturl/e8c5106</link>
<comments>http://mywebsite.com/details/e8c5106#comments</comments>
<pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate>
<category>Category 1</category>
<description>
<![CDATA[<div>
<img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0" />
<ul><li>Poster: someone's name;</li>
<li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
<li>Rating: 5</li>
<li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
</description>
</item>
<item>..
I have the following code below where I try to parse image and text:
$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1');
$descriptions = $xml->xpath('//item/description');
$mytitle= $xml->xpath('//item/title');
foreach ( $descriptions as $description_node ) {
// The description may not be valid XML, so use a more forgiving HTML parser mode
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
// Switch back to SimpleXML for readability
$description_sxml = simplexml_import_dom( $description_dom );
// Find all images, and extract their 'src' param
$imgs = $description_sxml->xpath('//img');
foreach($imgs as $image) {
echo "<img id=poster class=poster src={$image['src']}> {$mytitle}";
}
}
The above code extracts the images beautifully.... However, it does not extract the $mytitle (which would be "Article One") tag as I try on the last line of my code. This is supposed to extract from all items in the RSS feed.
Can anyone help me figure this one out please.
Many thanks,
Hernando

xpath() always returns an array (see http://www.php.net/manual/en/simplexmlelement.xpath.php), even if just one element is the result. If you know you will expect one element, you can simply use $mytitle[0].
You will have to iterate over each <item/> element, as otherwise you can't know which description and which title belong together. So the following should work:
$xml = simplexml_load_file('test.xml');
$items = $xml->xpath('//item');
foreach ( $items as $item) {
$descriptions = $item->description;
$mytitle = $item->title;
foreach ( $descriptions as $description_node ) {
// The description may not be valid XML, so use a more forgiving HTML parser mode
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
// Switch back to SimpleXML for readability
$description_sxml = simplexml_import_dom( $description_dom );
// Find all images, and extract their 'src' param
$imgs = $description_sxml->xpath('//img');
foreach($imgs as $image) {
echo "<img id=\"poster\" class=\"poster\" src=\"{$image['src']}\"> {$mytitle}";
}
}
}
By the way, I also added "" to you your <img/> element. I guess you want that, as this look very much like XML/HTML.

Related

how to get xml all nodes (tags only) from xml file using php? [duplicate]

i have a xml data which looks like this
<channel>
<title>-----</title>
<link>------</link>
<description>---</description>
<lastBuildDate>Tue, 27 Sep 2011 16:37:01 +0000</lastBuildDate>
<language>en</language>
<generator>-------</generator>
<item>
<title>-----</title>
<link>-------</link>
<comments>------</comments>
<pubDate>Tue, 27 Sep 2011 16:37:01 +0000</pubDate>
<category>-----</category>
</item>
<item>
<title>-----</title>
<link>-------</link>
<comments>------</comments>
<pubDate>Tue, 27 Sep 2011 </pubDate>
<category>-----</category>
</item>
<item>
<title>-----</title>
<link>-------</link>
<comments>------</comments>
<pubDate>Wed, 28 Sep 2011 16:37:01 +0000</pubDate>
<category>-----</category>
</item>
</channel>
from this i need to retrieve all the existing XML tags to the user like, channel, title, link, item etc., including their child tags too.. i mean all the existing tags in the XML file
i need help in how to do that using php, i used DOM and simple XML object but i can only get the values in a specific tag if i know the tag name which i need.,
but i actually need to work with many xml files for which i don't know what is the structure and tags of that particular xml, for that i need to get the existing tags names so that i can display them to the user to select what tags he needs...
I need suggestions for doing this using php.,
Thanks in advance.
Here's my stab at it:
$doc = new DOMDocument();
$doc->loadXML( $yourXmlString ); // or:
$doc->load( $yourXmlUrl );
$xpath = new DOMXpath( $doc );
$nodes = $xpath->query( '//*' );
$nodeNames = array();
foreach( $nodes as $node )
{
$nodeNames[ $node->nodeName ] = $node->nodeName;
}
var_dump( $nodeNames );
I wish I could have made the xpath expression a little more efficient though, but I can't think of anything. That's why I continuously overwrite the key of $nodeNames.
Come to think of it: perhaps I've misunderstood your question and you don't want the unique element names at all, but want literally all elements. If so: how do you want them? As strings? Including their full path?
Here's a simple example of how to use PHP's SimpleXML library as found at http://php.net/manual/en/simplexml.examples-basic.php
<?php
include 'example.php';
$movies = new SimpleXMLElement($xmlstr);
/* Access the <rating> nodes of the first movie.
* Output the rating scale, too. */
foreach ($movies->movie[0]->rating as $rating) {
switch((string) $rating['type']) { // Get attributes as element indices
case 'thumbs':
echo $rating, ' thumbs up';
break;
case 'stars':
echo $rating, ' stars';
break;
}
}
?>

PHP get img src from xml

I have a page with xml that looks like:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
I want to get the srcs of the imgs in the xml above.
The images are stored in the <description> however, they are not in the format of
<img...
they rather look like:
<img src="https://scontent.xx.fbc... .
the < is replace with <... I guess thats why $imgs = $dom->getElementsByTagName('img'); returns nothing.
Is there any work around?
This is how I call it:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;
//Then run a possible foreach
//something like:
foreach($imgs as $img){
$src= ///the src of the $img
//try it out
echo '<img src="'.$src.'" /> <br />',
}
Any Idea?
You have HTML embedded in XML tags, so you have to retrieve XML nodes, load each HTML and retrieve desired tag attribute.
In your XML there are different <description> nodes, so using ->getElementsByTagName will return more than your desired nodes. Use DOMXPath to retrieve only <description> nodes in the right tree position:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
Then iterate all nodes, load node value in a new DOMDocument (no need to decode html entities, DOM already decodes it for you), and extract src attribute from <img> node:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}
eval.in demo

Trying to Parse Only the Images from an RSS Feed

First, I am a php newbie. I have looked at the question and solution here. For my needs however, the parsing does not go deep enough into the various articles.
A small sampling of my rss feed reads like this:
<channel>
<atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />
<title>My Web Site</title>
<description>My Feed</description>
<link>http://mywebsite.com/</link>
<image>
<url>http://mywebsite.com/views/images/banner.jpg</url>
<title>My Title</title>
<link>http://mywebsite.com/</link>
<description>Visit My Site</description>
</image>
<item>
<title>Article One</title>
<guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
<link>http://mywebsite.com/geturl/e8c5106</link>
<comments>http://mywebsite.com/details/e8c5106#comments</comments>
<pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate>
<category>Category 1</category>
<description>
<![CDATA[<div>
<img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0" />
<ul><li>Poster: someone's name;</li>
<li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
<li>Rating: 5</li>
<li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
</description>
</item>
<item>..
The image links that I want to parse out are the ones way inside each Item > Description
The code in my php file reads:
<?php
$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');
$imgs = $xml->xpath('/item/description/img');
foreach($imgs as $image) {
echo $image->src;
}
?>
Can someone please help me figure out how to configure the php code above?
Also a very newbie question... once I get the resulting image urls, how can I display the images in a row on my html?
Many thanks!!!
Hernando
The <img> tags inside that RSS feed are not actually elements of the XML document, contrary to the syntax highlighting on this site - they are just text inside the <description> element which happen to contain the characters < and >.
The string <![CDATA[ tells the XML parser that everything from there until it encounters ]]> is to be treated as a raw string, regardless of what it contains. This is useful for embedding HTML inside XML, since the HTML tags wouldn't necessarily be valid XML. It is equivalent to escaping the whole HTML (e.g. with htmlspecialchars) so that the <img> tags would look like <img>. (I went into more technical details on another answer.)
So to extract the images from the RSS requires two steps: first, get the text of each <description>, and second, find all the <img> tags in that text.
$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');
$descriptions = $xml->xpath('//item/description');
foreach ( $descriptions as $description_node ) {
// The description may not be valid XML, so use a more forgiving HTML parser mode
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
// Switch back to SimpleXML for readability
$description_sxml = simplexml_import_dom( $description_dom );
// Find all images, and extract their 'src' param
$imgs = $description_sxml->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
}
}
I don't have much experience with xPath, but you could try the following:
$imgs = $xml->xpath('item//img');
This will select all img-elements which are inside item-elements, regardless if there are other elements inbetween. Removing the leading slash will search for item anywhere in the documet, not just from the root. Otherwise, you'd need something like /rss/channel/item....
As for displaying the images: Just output <img>-tags followed by line-breaks, like so:
foreach($imgs as $image) {
echo '<img src="' . $image->src . '" /><br />';
}
The preferred way would be to use CSS instead of <br>-tags, but I think they are simpler for a start.

Parsing meta data in RSS feed PHP

I am trying to extact the IMG SRC value out of the below RSS feed (only partial feed below).
I am currently using XML parser to get the rest of the items - which works fine (e.g.):
foreach($xml['RSS']['CHANNEL']['ITEM'] as $item)
{
...
$title = $item['TITLE'];
$description = $item['DESCRIPTION'];
$link = $item['LINK'];
$desc_imgsrc = <how do i get this for below RSS feed??>;
...
}
However - how do i get the IMG SRC value from below RSS feed into a PHP variable? Specifically i am trying to extact "http://thumbnails.---.com/VCPS/sm.jpg" string into $desc_imgsrc variable above? How can i adapt above code to do that? Thanks.
<item>
<title>Electric Cars - all about them</title>
<metadata:title xmlns:metadata="http://search.--.com/rss/2.0/Metadata">This is the title metadata</metadata:title>
<description>This is the description</description>
<metadata:description xmlns:metadata="http://search.---.com/rss/2.0/>
<![CDATA[<div class="rss_image" style="float:left;padding-right:10px;"><img border="0" vspace="0" hspace="0" width="10" src="http://thumbnails.---.com/VCPS/sm.jpg"></div><div class="rss_abstract" style="font:Arial 12px;width:100%;float:left;clear:both">This is the description</div>]]></metadata:description>
<pubDate>Fri, 25 Nov 2011 07:00 GMT</pubDate>
This is HTML (XML) inside an XML CDATA element. CDATA (character data) is not parsed by the XML parser. You need to extract the value the same way you did with the other elements. Then you can parse the element value, either by using a regular expression or even better use an XML parser again (if the HTML data is valid XML).
$doc = new DomDocument;
#$doc->loadHTML(...); // html string
// use # to supress the warning due to mixture of xml and html
$items = $doc->getElementsByTagName('img');
foreach ($items as $item)
{
$src = $item->getAttribute('src');
}

How to get the list of all existing tags in a XML using php

i have a xml data which looks like this
<channel>
<title>-----</title>
<link>------</link>
<description>---</description>
<lastBuildDate>Tue, 27 Sep 2011 16:37:01 +0000</lastBuildDate>
<language>en</language>
<generator>-------</generator>
<item>
<title>-----</title>
<link>-------</link>
<comments>------</comments>
<pubDate>Tue, 27 Sep 2011 16:37:01 +0000</pubDate>
<category>-----</category>
</item>
<item>
<title>-----</title>
<link>-------</link>
<comments>------</comments>
<pubDate>Tue, 27 Sep 2011 </pubDate>
<category>-----</category>
</item>
<item>
<title>-----</title>
<link>-------</link>
<comments>------</comments>
<pubDate>Wed, 28 Sep 2011 16:37:01 +0000</pubDate>
<category>-----</category>
</item>
</channel>
from this i need to retrieve all the existing XML tags to the user like, channel, title, link, item etc., including their child tags too.. i mean all the existing tags in the XML file
i need help in how to do that using php, i used DOM and simple XML object but i can only get the values in a specific tag if i know the tag name which i need.,
but i actually need to work with many xml files for which i don't know what is the structure and tags of that particular xml, for that i need to get the existing tags names so that i can display them to the user to select what tags he needs...
I need suggestions for doing this using php.,
Thanks in advance.
Here's my stab at it:
$doc = new DOMDocument();
$doc->loadXML( $yourXmlString ); // or:
$doc->load( $yourXmlUrl );
$xpath = new DOMXpath( $doc );
$nodes = $xpath->query( '//*' );
$nodeNames = array();
foreach( $nodes as $node )
{
$nodeNames[ $node->nodeName ] = $node->nodeName;
}
var_dump( $nodeNames );
I wish I could have made the xpath expression a little more efficient though, but I can't think of anything. That's why I continuously overwrite the key of $nodeNames.
Come to think of it: perhaps I've misunderstood your question and you don't want the unique element names at all, but want literally all elements. If so: how do you want them? As strings? Including their full path?
Here's a simple example of how to use PHP's SimpleXML library as found at http://php.net/manual/en/simplexml.examples-basic.php
<?php
include 'example.php';
$movies = new SimpleXMLElement($xmlstr);
/* Access the <rating> nodes of the first movie.
* Output the rating scale, too. */
foreach ($movies->movie[0]->rating as $rating) {
switch((string) $rating['type']) { // Get attributes as element indices
case 'thumbs':
echo $rating, ' thumbs up';
break;
case 'stars':
echo $rating, ' stars';
break;
}
}
?>

Categories