PHP get img src from xml - php

I have a page with xml that looks like:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
I want to get the srcs of the imgs in the xml above.
The images are stored in the <description> however, they are not in the format of
<img...
they rather look like:
<img src="https://scontent.xx.fbc... .
the < is replace with <... I guess thats why $imgs = $dom->getElementsByTagName('img'); returns nothing.
Is there any work around?
This is how I call it:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;
//Then run a possible foreach
//something like:
foreach($imgs as $img){
$src= ///the src of the $img
//try it out
echo '<img src="'.$src.'" /> <br />',
}
Any Idea?

You have HTML embedded in XML tags, so you have to retrieve XML nodes, load each HTML and retrieve desired tag attribute.
In your XML there are different <description> nodes, so using ->getElementsByTagName will return more than your desired nodes. Use DOMXPath to retrieve only <description> nodes in the right tree position:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
Then iterate all nodes, load node value in a new DOMDocument (no need to decode html entities, DOM already decodes it for you), and extract src attribute from <img> node:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}
eval.in demo

Related

how could i import xml list with same name using php domdocument

using php domdocument, to import xml file, i can't have the list of "tags"
I have tried multiple way but i can't
xml document :
<resource>
<title>hello world</title>
<tags>
<resource>great</resource>
<resource>fun</resource>
<resource>omg</resource>
</resource>
php :
<?php
$url='test.xml';
$doc = new DOMDocument();
$doc->load($url);
$feed = $doc->getElementsByTagName("resource");
foreach($feed as $entry) {
echo $entry->getElementsByTagName("username")->item(0)->nodeValue;
echo '<br>';
echo $entry->getElementsByTagName("tags")->item(0)->nodeValue;
echo '<br>';
}
i expect the outpout to be a list like that :
hello world
great
fun
omg
but the actual output is NOT a list the result is a sentence without space :
hello world greatfunomg
DOMDocument::getElementsByTagName() returns all descendant element nodes with the specified name. DOMElement::$nodeValue will return the text content of an element node including all its descendants.
In your case echo $entry->getElementsByTagName("tags")->item(0)->nodeValue fetches all tags, access the first node of that list and outputs its text content. That is greatfunomg.
Using the DOM methods to access nodes is verbose and requires a lot of code and if you want to do it right a lot of conditions. It is a lot easier if you use Xpath expressions. The allow you to scalar values and lists of nodes from an DOM.
$xml = <<<'XML'
<_>
<resource>
<title>hello world</title>
<tags>
<resource>great</resource>
<resource>fun</resource>
<resource>omg</resource>
</tags>
</resource>
</_>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
// create an Xpath instance for the document
$xpath = new DOMXpath($document);
// fetch resource nodes that are a direct children of the document element
$entries = $xpath->evaluate('/*/resource');
foreach($entries as $entry) {
// fetch the title node of the current entry as a string
echo $xpath->evaluate('string(title)', $entry), "\n";
// fetch resource nodes that are children of the tags node
// and map them into an array of strings
$tags = array_map(
function(\DOMElement $node) {
return $node->textContent;
},
iterator_to_array($xpath->evaluate('tags/resource', $entry))
);
echo implode(', ', $tags), "\n";
}
Output:
hello world
great, fun, omg
If you just need to output the first piece of text for each <resource> element - wherever it is, then using XPath and (making sure you ignore whitespace on load) pick the first child element of this and output the node value.
Ignoring the whitespace on load is important as the whitespace will create nodes for all the padding around each element and so the first child of each <resource> element may just be a new line or tab.
$xml = '<root>
<resource>
<title>hello world</title>
<tags>
<resource>great</resource>
<resource>fun</resource>
<resource>omg</resource>
</tags>
</resource>
</root>';
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
// $doc->load($filename); // If loading from a file
$xpath = new DOMXpath($doc);
$resources = $xpath->query("//resource");
foreach ( $resources as $resource ){
echo $resource->firstChild->nodeValue.PHP_EOL;
}
The output of which is
hello world
great
fun
omg
Or without using XPath...
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
//$doc->load($filename);
$resources = $doc->getElementsByTagName("resource");
foreach ( $resources as $resource ){
echo $resource->firstChild->nodeValue.PHP_EOL;
}

Trying to Parse Images and Text from an RSS Feed

This is a continuation of the thread here: Trying to Parse Only the Images from an RSS Feed
This time I want to parse both Images and Certain Items from an RSS feed. A Sampling of the RSS feed looks like this:
<channel>
<atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />
<item>
<title>Article One</title>
<guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
<link>http://mywebsite.com/geturl/e8c5106</link>
<comments>http://mywebsite.com/details/e8c5106#comments</comments>
<pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate>
<category>Category 1</category>
<description>
<![CDATA[<div>
<img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0" />
<ul><li>Poster: someone's name;</li>
<li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
<li>Rating: 5</li>
<li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
</description>
</item>
<item>..
I have the following code below where I try to parse image and text:
$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1');
$descriptions = $xml->xpath('//item/description');
$mytitle= $xml->xpath('//item/title');
foreach ( $descriptions as $description_node ) {
// The description may not be valid XML, so use a more forgiving HTML parser mode
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
// Switch back to SimpleXML for readability
$description_sxml = simplexml_import_dom( $description_dom );
// Find all images, and extract their 'src' param
$imgs = $description_sxml->xpath('//img');
foreach($imgs as $image) {
echo "<img id=poster class=poster src={$image['src']}> {$mytitle}";
}
}
The above code extracts the images beautifully.... However, it does not extract the $mytitle (which would be "Article One") tag as I try on the last line of my code. This is supposed to extract from all items in the RSS feed.
Can anyone help me figure this one out please.
Many thanks,
Hernando
xpath() always returns an array (see http://www.php.net/manual/en/simplexmlelement.xpath.php), even if just one element is the result. If you know you will expect one element, you can simply use $mytitle[0].
You will have to iterate over each <item/> element, as otherwise you can't know which description and which title belong together. So the following should work:
$xml = simplexml_load_file('test.xml');
$items = $xml->xpath('//item');
foreach ( $items as $item) {
$descriptions = $item->description;
$mytitle = $item->title;
foreach ( $descriptions as $description_node ) {
// The description may not be valid XML, so use a more forgiving HTML parser mode
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
// Switch back to SimpleXML for readability
$description_sxml = simplexml_import_dom( $description_dom );
// Find all images, and extract their 'src' param
$imgs = $description_sxml->xpath('//img');
foreach($imgs as $image) {
echo "<img id=\"poster\" class=\"poster\" src=\"{$image['src']}\"> {$mytitle}";
}
}
}
By the way, I also added "" to you your <img/> element. I guess you want that, as this look very much like XML/HTML.

php xml dom extract data from non-standard xml

Hello I have xml like this:
<specs><my>base</my><root>none</root></specs>
<books>
<item>
<id>14</id>
<title>How to live</title>
</item>
<item>
...
</item>
</books>
How can I extract value from < my > ? and then < title >?
when I have data such as :<specs><my>base</my><root>none</root></specs> in xml this code works for me. So how should I modify it to work with data such as books as well in xml?
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$entry = $xpath->query("//xml/specs/my");
foreach($entry as $ent){
echo $ent->nodeValue;
}
simply I added this:
$xml="<xml>".$xml."</xml>";
and now this works $xpath->query("//xml/specs/my"); as well as $xpath->query("//xml/books/item");

Accessing a single XML DOM Document node

I am completely new to DOM Documents, basically what I am trying to do, is to load a RSS feed in and select only one node, and then save it to a XML file.
Here is the XML I am loading from a web feed:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Markets</title>
<description/>
<link>http://www.website.com</link>
<language>en-us</language>
<copyright>XML Output Copyright</copyright>
<ttl>15</ttl>
<pubDate>Tue, 16 Nov 2010 09:38:00 +0000</pubDate>
<webMaster>admin#website.com</webMaster>
<image>
<title>title</title>
<url>http://www.website.com/images/xmllogo.gif</url>
<link>http://www.website.com</link>
<width>144</width>
<height>16</height>
</image>
<item>
<title>title</title>
<description>the description goes here
</description>
<enclosure url="http://www.website.com/images/image.png" type="image/png"/>
</item>
</channel>
</rss>
Here is my lame attempt at getting the <description> node and saving it to feed.xml:
<?php
$feed = new DOMDocument();
$feed->load('http://www.website.com/directory/directory/cz.c');
$nodeValue = $feed->getElementsByTagName('description')->item(0)->nodeValue;
$feed->save("feed.xml");
?>
So basically I need to get the description tag, and save it as a XML file.
Any help would be appreciated, thanx in advance!
Almost correct. To get the "outerXml" of a node, you can pass the node to saveXml()
$feed = new DOMDocument();
$feed->load('http://www.website.com/directory/directory/cz.c');
$xml = $feed->saveXml($feed->getElementsByTagName('description')->item(0));
file_put_contents("feed.xml", $xml);
Saving with file_put_contents will not include an XML prolog. Note that in your example, the first description element is empty, so the file will contain <description/>.
If you want to extract the node as standalone XML Document, you have to instantiate a new DOMDocument and import the DOMNode and then use save().
$dom = new DOMDocument($feed->xmlVersion, $feed->xmlEncoding);
$dom->appendChild(
$dom->importNode(
$feed->getElementsByTagName('description')->item(0),
TRUE
)
);
echo $dom->save('new.xml');
$feed = simplexml_load_file('feed.xml');
$descr=$feed->channel->description;
Try this

Transform RSS-Feed into another "standard" XML-Format with PHP

quick question: I need to transform a default RSS Structure into another XML-format.
The RSS File is like....
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Name des RSS Feed</title>
<description>Feed Beschreibung</description>
<language>de</language>
<link>http://xml-rss.de</link>
<lastBuildDate>Sat, 1 Jan 2000 00:00:00 GMT</lastBuildDate>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
</channel>
</rss>
...and I want to extract only the item-elements (with childs and attributes) XML like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
...
It hasn't to be stored into a file. I need just the output.
edit: Furthermore you need to know: The RSS File could have dynamic numbers of items. This is just a sample. So it has to be looped with while, for, for-each, ...
I tried different approaches with DOMNode, SimpleXML, XPath, ... but without success.
Thanks
chris
A different approach would be to use an XSLT:
$xsl = <<< XSL
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<items>
<xsl:copy-of select="//item">
<xsl:apply-templates/>
</xsl:copy-of>
</items>
</xsl:template>
</xsl:stylesheet>
XSL;
The above stylesheet has just one rule, namely deep copying all <item> elements from the source XML to an XML file and ignore everything else from the source file. The nodes will be copied into an <items> element for root node. To process this, you'd do
$xslDoc = new DOMDocument(); // create Doc for XSLT
$xslDoc->loadXML($xsl); // load stylesheet into it
$xmlDoc = new DOMDocument(); // create Doc for RSS
$xmlDoc->loadXML($xml); // load your XML/RSS into it
$proc = new XSLTProcessor(); // init XSLT engine
$proc->importStylesheet($xslDoc); // load stylesheet into engine
echo $proc->transformToXML($xmlDoc); // output transformed XML
Instead of outputting, you could just write the return value to file.
Further reading:
http://de3.php.net/manual/en/class.xsltprocessor.php
http://www.w3.org/TR/xslt#copy-of
What you ask for is hardly a transformation. You are basically just extracting the <item> elements as they are. Also, the result you give is not valid XML, as it lacks a root node.
Apart from that, you can simple do it like this:
$dom = new DOMDocument; // init new DOMDocument
$dom->loadXML($xml); // load some XML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) { // Iterate over found item elements
echo $dom->saveXml($node); // output the item node outerHTML
}
The above would echo the <item> nodes. You could simply buffer the output, concatenate it to a string, write to it an array and implode, etc - and write it to file.
If you want to do it properly with DOM (and a root node), the full code would be:
$dom = new DOMDocument; // init DOMDocument for RSS
$dom->loadXML($xml); // load some XML into it
$items = new DOMDocument; // init DOMDocument for new file
$items->preserveWhiteSpace = FALSE; // dump whitespace
$items->formatOutput = TRUE; // make output pretty
$items->loadXML('<items/>'); // create root node
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) { // iterate over found item nodes
$copy = $items->importNode($node, TRUE); // deep copy of item node
$items->documentElement->appendChild($copy); // append item nodes
}
echo $items->saveXML(); // outputs the new document
Instead of saveXML(), you'd use save('filename.xml') to write it to a file.
Try:
<?php
$xmlFile = new DOMDocument(); //Instantiate new DOMDocument
$xmlFile->load("URL TO RSS/XML FILE"); //Load in XML/RSS file
$xmlString = file_get_contents("URL TO RSS/XML FILE");
$title[] = "";
$description[] = "";
$link[] = "";
$pubDate[] = "";
$guid[] = "";
for($i = 0; $i < substr_count($xmlString, "<item>"); $i++)
{
$title[] = $xmlFile->getElementsByTagName("title")->item(0)->nodeValue; //Get the value of the node <title>
$description[] = $xmlFile->getElementsByTagName("description")->item(0)->nodeValue;
$link[] = $xmlFile->getElementsByTagName("link")->item(0)->nodeValue;
$pubDate[] = $xmlFile->getElementsByTagName("pubDate")->item(0)->nodeValue;
$guid[] = $xmlFile->getElementsByTagName("guid")->item(0)->nodeValue;
}
?>
Untested but the arrays
$title[]
$description[]
$link[]
$pubDate[]
$guid[]
should be populated with all of the data that you need!
EDIT:
OK so another approach:
<?php
$xmlString = file_get_contents("URL TO RSS/XML FILE");
$titles = preg_filter("/<title>([.]*)</title>/","\\1", mixed $xmlString);
$descriptions = preg_filter("/<description>([.]*)</description>/","\\1", mixed $xmlString);
$links = preg_filter("/<link>([.]*)</link>/","\\1", mixed $xmlString);
$pubDates = preg_filter("/<pubDate>([.]*)</pubDate>/","\\1", mixed $xmlString);
$guids = preg_filter("/<guid>([.]*)</guid>/","\\1", mixed $xmlString);
?>
In this example each variable will be filled with the correct values.

Categories