quick question: I need to transform a default RSS Structure into another XML-format.
The RSS File is like....
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Name des RSS Feed</title>
<description>Feed Beschreibung</description>
<language>de</language>
<link>http://xml-rss.de</link>
<lastBuildDate>Sat, 1 Jan 2000 00:00:00 GMT</lastBuildDate>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
</channel>
</rss>
...and I want to extract only the item-elements (with childs and attributes) XML like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
...
It hasn't to be stored into a file. I need just the output.
edit: Furthermore you need to know: The RSS File could have dynamic numbers of items. This is just a sample. So it has to be looped with while, for, for-each, ...
I tried different approaches with DOMNode, SimpleXML, XPath, ... but without success.
Thanks
chris
A different approach would be to use an XSLT:
$xsl = <<< XSL
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<items>
<xsl:copy-of select="//item">
<xsl:apply-templates/>
</xsl:copy-of>
</items>
</xsl:template>
</xsl:stylesheet>
XSL;
The above stylesheet has just one rule, namely deep copying all <item> elements from the source XML to an XML file and ignore everything else from the source file. The nodes will be copied into an <items> element for root node. To process this, you'd do
$xslDoc = new DOMDocument(); // create Doc for XSLT
$xslDoc->loadXML($xsl); // load stylesheet into it
$xmlDoc = new DOMDocument(); // create Doc for RSS
$xmlDoc->loadXML($xml); // load your XML/RSS into it
$proc = new XSLTProcessor(); // init XSLT engine
$proc->importStylesheet($xslDoc); // load stylesheet into engine
echo $proc->transformToXML($xmlDoc); // output transformed XML
Instead of outputting, you could just write the return value to file.
Further reading:
http://de3.php.net/manual/en/class.xsltprocessor.php
http://www.w3.org/TR/xslt#copy-of
What you ask for is hardly a transformation. You are basically just extracting the <item> elements as they are. Also, the result you give is not valid XML, as it lacks a root node.
Apart from that, you can simple do it like this:
$dom = new DOMDocument; // init new DOMDocument
$dom->loadXML($xml); // load some XML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) { // Iterate over found item elements
echo $dom->saveXml($node); // output the item node outerHTML
}
The above would echo the <item> nodes. You could simply buffer the output, concatenate it to a string, write to it an array and implode, etc - and write it to file.
If you want to do it properly with DOM (and a root node), the full code would be:
$dom = new DOMDocument; // init DOMDocument for RSS
$dom->loadXML($xml); // load some XML into it
$items = new DOMDocument; // init DOMDocument for new file
$items->preserveWhiteSpace = FALSE; // dump whitespace
$items->formatOutput = TRUE; // make output pretty
$items->loadXML('<items/>'); // create root node
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) { // iterate over found item nodes
$copy = $items->importNode($node, TRUE); // deep copy of item node
$items->documentElement->appendChild($copy); // append item nodes
}
echo $items->saveXML(); // outputs the new document
Instead of saveXML(), you'd use save('filename.xml') to write it to a file.
Try:
<?php
$xmlFile = new DOMDocument(); //Instantiate new DOMDocument
$xmlFile->load("URL TO RSS/XML FILE"); //Load in XML/RSS file
$xmlString = file_get_contents("URL TO RSS/XML FILE");
$title[] = "";
$description[] = "";
$link[] = "";
$pubDate[] = "";
$guid[] = "";
for($i = 0; $i < substr_count($xmlString, "<item>"); $i++)
{
$title[] = $xmlFile->getElementsByTagName("title")->item(0)->nodeValue; //Get the value of the node <title>
$description[] = $xmlFile->getElementsByTagName("description")->item(0)->nodeValue;
$link[] = $xmlFile->getElementsByTagName("link")->item(0)->nodeValue;
$pubDate[] = $xmlFile->getElementsByTagName("pubDate")->item(0)->nodeValue;
$guid[] = $xmlFile->getElementsByTagName("guid")->item(0)->nodeValue;
}
?>
Untested but the arrays
$title[]
$description[]
$link[]
$pubDate[]
$guid[]
should be populated with all of the data that you need!
EDIT:
OK so another approach:
<?php
$xmlString = file_get_contents("URL TO RSS/XML FILE");
$titles = preg_filter("/<title>([.]*)</title>/","\\1", mixed $xmlString);
$descriptions = preg_filter("/<description>([.]*)</description>/","\\1", mixed $xmlString);
$links = preg_filter("/<link>([.]*)</link>/","\\1", mixed $xmlString);
$pubDates = preg_filter("/<pubDate>([.]*)</pubDate>/","\\1", mixed $xmlString);
$guids = preg_filter("/<guid>([.]*)</guid>/","\\1", mixed $xmlString);
?>
In this example each variable will be filled with the correct values.
Related
Let's say I have the following .xml file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item>
<name>Foo</name>
</item>
<item>
<name>Bar</name>
</item>
</root>
In this sample file, I'm trying to append new nodes <item> to node <root> after the last node <item>.
I'm trying to append newly created <item> nodes after the last <item> node in the <root> node in the .xml file.
<?php
$file = new DOMDocument;
$file->load("xml.xml");
$file->loadXML($file->saveXML());
$root = $file->getElementsByTagName('root')->item(0);
foreach (["Foo_1", "Bar_2", "Foo_3", "Bar_4"] as $val) {
$item = new DOMElement('item');
$item->appendChild(new DOMElement('name', $val));
$root->appendChild(item);
}
?>
But I'm getting an error:
Fatal error: Uncaught Error: Call to a member function appendChild() on null in C:\Users\pfort\Desktop\p.php:12
Stack trace:
#0 {main}
thrown in C:\Users\user_acer\Desktop\p.php on line 12
What am I doing wrong?
There's multiple issues with your example code. I will address the error you received first:
There is no element <terminy> in your example XML, so
$root = $file->getElementsByTagName('terminy')->item(0);
will return null. That's why you are receiving the
Call to a member function appendChild() on null
error at
$root->appendChild(item);
Also, item is a typo, because it's not a valid variable name (but a name for a non-existent constant); you meant $item.
I'm assuming "terminy" means something similar to "root" in your native language and that you actually meant to write
$root = $file->getElementsByTagName('root')->item(0);
By the way: if you want a reference to the root node of an XML document, you can also use $file->docomentElement.
However, there are other issues with your example code:
$file->load("xml.xml");
$file->loadXML($file->saveXML()); // why are you reloading it in this way?
The last line is unnecessary. You are reloading the same XML again. Is it for formatting purposes? If so, there's a better option available:
$file->preserveWhiteSpace = false;
$file->formatOutput = true;
$file->load("xml.xml");
Lastly: you cannot append children to a node that has not been associated with a document yet. So, to create a new item and associate it with the document, you either do (recommended):
// automatically associate new nodes with document
$item = $file->createElement('item');
$item->appendChild($file->createElement('name', $val));
or (more cumbersome):
// import nodes to associate them with document
$item = $file->importNode(new DOMElement('item'));
$item->appendChild($file->importNode(new DOMElement('name', $val)));
So, putting it all together it becomes:
<?php
$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item>
<name>Foo</name>
</item>
<item>
<name>Bar</name>
</item>
</root>
XML;
$file = new DOMDocument;
$file->preserveWhiteSpace = false;
$file->formatOutput = true;
$file->loadXML($xml); // (for demo purpose loading above XML) replace this with $file->load("xml.xml"); in your actual code
$root = $file->documentElement;
foreach (["Foo_1", "Bar_2", "Foo_3", "Bar_4"] as $val) {
$item = $file->createElement('item');
$item->appendChild($file->createElement('name', $val));
$root->appendChild($item);
}
echo $file->saveXML();
**PROBLEM SOLVED**
I lost too much time on this problem. The good news is, I already know how to get what I need. Here I offer a solution - for everyone who will need to solve the same problem.
Perhaps this solution will be useful for anyone who needs it.
<?php
// snippet of xml temple
$xml = <<<XML
<item date="%s" status="%s">
<name>%s</name>
</item>
XML;
// prepare snippet
$xmlSnippet = sprintf($xml, "2022-11-21", 0, "Foo Bar");
// new DOMDocument
$dom = new DOMDocument;
$dom->preserveWhiteSpace = 0;
$dom->formatOutput = 1;
// load of .xml file content and load to DOMDocument object
$file = simplexml_load_file("xml.xml");
$dom->loadXML($file->asXML());
// creating of fragment from snippet
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($xmlSnippet);
//append the snippet to the DOMDocument
// and save it to the xml.xml file
$dom->documentElement->appendChild($fragment);
$dom->save("xml.xml");
?>
Result:
I have a page with xml that looks like:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
I want to get the srcs of the imgs in the xml above.
The images are stored in the <description> however, they are not in the format of
<img...
they rather look like:
<img src="https://scontent.xx.fbc... .
the < is replace with <... I guess thats why $imgs = $dom->getElementsByTagName('img'); returns nothing.
Is there any work around?
This is how I call it:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;
//Then run a possible foreach
//something like:
foreach($imgs as $img){
$src= ///the src of the $img
//try it out
echo '<img src="'.$src.'" /> <br />',
}
Any Idea?
You have HTML embedded in XML tags, so you have to retrieve XML nodes, load each HTML and retrieve desired tag attribute.
In your XML there are different <description> nodes, so using ->getElementsByTagName will return more than your desired nodes. Use DOMXPath to retrieve only <description> nodes in the right tree position:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
Then iterate all nodes, load node value in a new DOMDocument (no need to decode html entities, DOM already decodes it for you), and extract src attribute from <img> node:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}
eval.in demo
Hello I have xml like this:
<specs><my>base</my><root>none</root></specs>
<books>
<item>
<id>14</id>
<title>How to live</title>
</item>
<item>
...
</item>
</books>
How can I extract value from < my > ? and then < title >?
when I have data such as :<specs><my>base</my><root>none</root></specs> in xml this code works for me. So how should I modify it to work with data such as books as well in xml?
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$entry = $xpath->query("//xml/specs/my");
foreach($entry as $ent){
echo $ent->nodeValue;
}
simply I added this:
$xml="<xml>".$xml."</xml>";
and now this works $xpath->query("//xml/specs/my"); as well as $xpath->query("//xml/books/item");
I've got a problem with parsing an XML file (nb. well formed one).
Consider XML file like this:
<?xml version="1.0" encoding="utf-8" ?>
<root>
<list>
<item no="1">
<title>Item's 1 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
<item no="2">
<title>Item's 2 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
</list>
</root>
I need to get contents contents of each item in the list and put them in an array. Generally not a problem, but in this case, I can't get my head round it.
Problem lays in <content> contents. It is string with tags in-between. I can't find a way to extract the contents. SimpleXML returns/echoes just the string with anything including and inside <special> tags stripped out. Like this:
Some long content with inside.
I'd ideally want it to get a string like this:
Some long content with <special>tags</special> inside
How do I get it?
You could use DOMDocument which is built into PHP.
<?php
$xml = <<<END
<?xml version="1.0" encoding="utf-8" ?>
<root>
<list>
<item no="1">
<title>Item's 1 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
<item no="2">
<title>Item's 2 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
</list>
</root>
END;
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml);
$nodes = $doc->getElementsByTagName('content');
foreach ( $nodes as $node )
{
$temp_doc = new DOMDocument('1.0', 'UTF-8');
foreach ( $node->childNodes as $child )
$temp_doc->appendChild($temp_doc->importNode($child, true));
echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}
To select the top level "content" elements (in case there are "content" elements inside), you can use DOMXPath.
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml); // $xml from the example above
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('/root/list/item/content');
foreach ( $nodes as $node )
{
$temp_doc = new DOMDocument('1.0', 'UTF-8');
foreach ( $node->childNodes as $child )
$temp_doc->appendChild($temp_doc->importNode($child, true));
echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}
SimpleXML just doesn't support mixed content (text nodes with element nodes as siblings). I suggest you use XMLReader instead.
You could use SimpleXML's asXML function. It will return that called node as the xml string;
$xml = simplexml_load_file($file);
foreach($xml->list->item as $item) {
$content = $item->contents->asXML();
echo $content."\n";
}
will print:
<content>Some long content with <special>tags</special> inside</content>
<content>Some long content with <special>tags</special> inside</content>
it's a little ugly but you could then clip out the <content> and </content> with a substr:
$content = substr($content,9,-10);
I am completely new to DOM Documents, basically what I am trying to do, is to load a RSS feed in and select only one node, and then save it to a XML file.
Here is the XML I am loading from a web feed:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Markets</title>
<description/>
<link>http://www.website.com</link>
<language>en-us</language>
<copyright>XML Output Copyright</copyright>
<ttl>15</ttl>
<pubDate>Tue, 16 Nov 2010 09:38:00 +0000</pubDate>
<webMaster>admin#website.com</webMaster>
<image>
<title>title</title>
<url>http://www.website.com/images/xmllogo.gif</url>
<link>http://www.website.com</link>
<width>144</width>
<height>16</height>
</image>
<item>
<title>title</title>
<description>the description goes here
</description>
<enclosure url="http://www.website.com/images/image.png" type="image/png"/>
</item>
</channel>
</rss>
Here is my lame attempt at getting the <description> node and saving it to feed.xml:
<?php
$feed = new DOMDocument();
$feed->load('http://www.website.com/directory/directory/cz.c');
$nodeValue = $feed->getElementsByTagName('description')->item(0)->nodeValue;
$feed->save("feed.xml");
?>
So basically I need to get the description tag, and save it as a XML file.
Any help would be appreciated, thanx in advance!
Almost correct. To get the "outerXml" of a node, you can pass the node to saveXml()
$feed = new DOMDocument();
$feed->load('http://www.website.com/directory/directory/cz.c');
$xml = $feed->saveXml($feed->getElementsByTagName('description')->item(0));
file_put_contents("feed.xml", $xml);
Saving with file_put_contents will not include an XML prolog. Note that in your example, the first description element is empty, so the file will contain <description/>.
If you want to extract the node as standalone XML Document, you have to instantiate a new DOMDocument and import the DOMNode and then use save().
$dom = new DOMDocument($feed->xmlVersion, $feed->xmlEncoding);
$dom->appendChild(
$dom->importNode(
$feed->getElementsByTagName('description')->item(0),
TRUE
)
);
echo $dom->save('new.xml');
$feed = simplexml_load_file('feed.xml');
$descr=$feed->channel->description;
Try this