php xml dom extract data from non-standard xml

php xml dom extract data from non-standard xml - php

Hello I have xml like this:
<specs><my>base</my><root>none</root></specs>
<books>
<item>
<id>14</id>
<title>How to live</title>
</item>
<item>
...
</item>
</books>
How can I extract value from < my > ? and then < title >?
when I have data such as :<specs><my>base</my><root>none</root></specs> in xml this code works for me. So how should I modify it to work with data such as books as well in xml?
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$entry = $xpath->query("//xml/specs/my");
foreach($entry as $ent){
echo $ent->nodeValue;
}

simply I added this:
$xml="<xml>".$xml."</xml>";
and now this works $xpath->query("//xml/specs/my"); as well as $xpath->query("//xml/books/item");

Related

PHP - Replace child node in XML

I want to replace a child node name in the XML through PHP and also want to append the text in the child node as well. Can you please help me to replace the child node name and append text in the same node in PHP?
PHP Code:
$doc = new DOMDocument;
$doc->load('abc.xml');
$thedocument = $doc->documentElement;
$xpath = new DOMXPath($doc);
// Need a code to replace the name of child node "<id>" to "<link>" and append text "abc.com/" in the node <id>
$doc->formatOutput = true;
$result1 = $doc->saveXML();
$doc->save('abc.xml');
XML (Current):
<rss>
<data>
<item>
<id>1122</id>
<title>Test 123</title>
</item>
</data>
</rss>
XML (Needed):
<rss>
<data>
<item>
<link>abc.com/?qs=1122</link>
<title>Test 123</title>
</item>
</data>
</rss>

PHP get img src from xml

I have a page with xml that looks like:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
I want to get the srcs of the imgs in the xml above.
The images are stored in the <description> however, they are not in the format of
<img...
they rather look like:
<img src="https://scontent.xx.fbc... .
the < is replace with <... I guess thats why $imgs = $dom->getElementsByTagName('img'); returns nothing.
Is there any work around?
This is how I call it:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;
//Then run a possible foreach
//something like:
foreach($imgs as $img){
$src= ///the src of the $img
//try it out
echo '<img src="'.$src.'" /> <br />',
}
Any Idea?

You have HTML embedded in XML tags, so you have to retrieve XML nodes, load each HTML and retrieve desired tag attribute.
In your XML there are different <description> nodes, so using ->getElementsByTagName will return more than your desired nodes. Use DOMXPath to retrieve only <description> nodes in the right tree position:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
Then iterate all nodes, load node value in a new DOMDocument (no need to decode html entities, DOM already decodes it for you), and extract src attribute from <img> node:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}
eval.in demo

PHP XML parsing going directly to value by attribute

i have a XML document that looks like this:
<body>
<item id="9982a">
<value>ab</value>
</item>
<item id="9982b">
<value>abc</value>
</item>
etc...
</body>
Now, i need to get the value for a key, the document is very very big, is there any way to go directly to the key when i know the id? Rather then loop it?
Something like:
$xml = simplexml_load_string(file_get_contents('http://somesite.com/new.xml'));
$body = $xml->body;
$body->item['id'][9982a]; // ab
?

xpathis your friend, you can stay with simplexml:
$xml = simplexml_load_string($x); // assume XML in $x
$result = $xml->xpath("/body/item[#id = '9982a']/value")[0]; // requires PHP >= 5.4
echo $result;
Comment:
in PHP < 5.4, do...
$result = $xml->xpath("/body/item[#id = '9982a']/value");
$result = $result[0];
see it working: https://eval.in/101766

Yes, use Simple HTML DOM Parser instead of SimpleXML.
It would be as easy as:
$xml->find('item[id="9982b"]',0)->find('value',0)->innertext;

It is possible with DOMXpath::evaluate() to fetch scalar values from a DOM using xpath expressions:
$xml = <<<'XML'
<body>
<item id="9982a">
<value>ab</value>
</item>
<item id="9982b">
<value>abc</value>
</item>
</body>
XML;
$dom = new DOMDocument;
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
var_dump(
$xpath->evaluate('string(//body/item[#id="9982b"]/value)')
);

Parsing XML file

I've got a problem with parsing an XML file (nb. well formed one).
Consider XML file like this:
<?xml version="1.0" encoding="utf-8" ?>
<root>
<list>
<item no="1">
<title>Item's 1 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
<item no="2">
<title>Item's 2 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
</list>
</root>
I need to get contents contents of each item in the list and put them in an array. Generally not a problem, but in this case, I can't get my head round it.
Problem lays in <content> contents. It is string with tags in-between. I can't find a way to extract the contents. SimpleXML returns/echoes just the string with anything including and inside <special> tags stripped out. Like this:
Some long content with inside.
I'd ideally want it to get a string like this:
Some long content with <special>tags</special> inside
How do I get it?

You could use DOMDocument which is built into PHP.
<?php
$xml = <<<END
<?xml version="1.0" encoding="utf-8" ?>
<root>
<list>
<item no="1">
<title>Item's 1 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
<item no="2">
<title>Item's 2 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
</list>
</root>
END;
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml);
$nodes = $doc->getElementsByTagName('content');
foreach ( $nodes as $node )
{
$temp_doc = new DOMDocument('1.0', 'UTF-8');
foreach ( $node->childNodes as $child )
$temp_doc->appendChild($temp_doc->importNode($child, true));
echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}
To select the top level "content" elements (in case there are "content" elements inside), you can use DOMXPath.
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml); // $xml from the example above
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('/root/list/item/content');
foreach ( $nodes as $node )
{
$temp_doc = new DOMDocument('1.0', 'UTF-8');
foreach ( $node->childNodes as $child )
$temp_doc->appendChild($temp_doc->importNode($child, true));
echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}

SimpleXML just doesn't support mixed content (text nodes with element nodes as siblings). I suggest you use XMLReader instead.

You could use SimpleXML's asXML function. It will return that called node as the xml string;
$xml = simplexml_load_file($file);
foreach($xml->list->item as $item) {
$content = $item->contents->asXML();
echo $content."\n";
}
will print:
<content>Some long content with <special>tags</special> inside</content>
<content>Some long content with <special>tags</special> inside</content>
it's a little ugly but you could then clip out the <content> and </content> with a substr:
$content = substr($content,9,-10);

Transform RSS-Feed into another "standard" XML-Format with PHP

quick question: I need to transform a default RSS Structure into another XML-format.
The RSS File is like....
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Name des RSS Feed</title>
<description>Feed Beschreibung</description>
<language>de</language>
<link>http://xml-rss.de</link>
<lastBuildDate>Sat, 1 Jan 2000 00:00:00 GMT</lastBuildDate>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
</channel>
</rss>
...and I want to extract only the item-elements (with childs and attributes) XML like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<item>
<title>Titel der Nachricht</title>
<description>Die Nachricht an sich</description>
<link>http://xml-rss.de/link-zur-nachricht.htm</link>
<pubDate>Sat, 1. Jan 2000 00:00:00 GMT</pubDate>
<guid>01012000-000000</guid>
</item>
...
It hasn't to be stored into a file. I need just the output.
edit: Furthermore you need to know: The RSS File could have dynamic numbers of items. This is just a sample. So it has to be looped with while, for, for-each, ...
I tried different approaches with DOMNode, SimpleXML, XPath, ... but without success.
Thanks
chris

A different approach would be to use an XSLT:
$xsl = <<< XSL
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<items>
<xsl:copy-of select="//item">
<xsl:apply-templates/>
</xsl:copy-of>
</items>
</xsl:template>
</xsl:stylesheet>
XSL;
The above stylesheet has just one rule, namely deep copying all <item> elements from the source XML to an XML file and ignore everything else from the source file. The nodes will be copied into an <items> element for root node. To process this, you'd do
$xslDoc = new DOMDocument(); // create Doc for XSLT
$xslDoc->loadXML($xsl); // load stylesheet into it
$xmlDoc = new DOMDocument(); // create Doc for RSS
$xmlDoc->loadXML($xml); // load your XML/RSS into it
$proc = new XSLTProcessor(); // init XSLT engine
$proc->importStylesheet($xslDoc); // load stylesheet into engine
echo $proc->transformToXML($xmlDoc); // output transformed XML
Instead of outputting, you could just write the return value to file.
Further reading:
http://de3.php.net/manual/en/class.xsltprocessor.php
http://www.w3.org/TR/xslt#copy-of

What you ask for is hardly a transformation. You are basically just extracting the <item> elements as they are. Also, the result you give is not valid XML, as it lacks a root node.
Apart from that, you can simple do it like this:
$dom = new DOMDocument; // init new DOMDocument
$dom->loadXML($xml); // load some XML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) { // Iterate over found item elements
echo $dom->saveXml($node); // output the item node outerHTML
}
The above would echo the <item> nodes. You could simply buffer the output, concatenate it to a string, write to it an array and implode, etc - and write it to file.
If you want to do it properly with DOM (and a root node), the full code would be:
$dom = new DOMDocument; // init DOMDocument for RSS
$dom->loadXML($xml); // load some XML into it
$items = new DOMDocument; // init DOMDocument for new file
$items->preserveWhiteSpace = FALSE; // dump whitespace
$items->formatOutput = TRUE; // make output pretty
$items->loadXML('<items/>'); // create root node
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//item'); // Find all item elements
foreach($nodes as $node) { // iterate over found item nodes
$copy = $items->importNode($node, TRUE); // deep copy of item node
$items->documentElement->appendChild($copy); // append item nodes
}
echo $items->saveXML(); // outputs the new document
Instead of saveXML(), you'd use save('filename.xml') to write it to a file.

Try:
<?php
$xmlFile = new DOMDocument(); //Instantiate new DOMDocument
$xmlFile->load("URL TO RSS/XML FILE"); //Load in XML/RSS file
$xmlString = file_get_contents("URL TO RSS/XML FILE");
$title[] = "";
$description[] = "";
$link[] = "";
$pubDate[] = "";
$guid[] = "";
for($i = 0; $i < substr_count($xmlString, "<item>"); $i++)
{
$title[] = $xmlFile->getElementsByTagName("title")->item(0)->nodeValue; //Get the value of the node <title>
$description[] = $xmlFile->getElementsByTagName("description")->item(0)->nodeValue;
$link[] = $xmlFile->getElementsByTagName("link")->item(0)->nodeValue;
$pubDate[] = $xmlFile->getElementsByTagName("pubDate")->item(0)->nodeValue;
$guid[] = $xmlFile->getElementsByTagName("guid")->item(0)->nodeValue;
}
?>
Untested but the arrays
$title[]
$description[]
$link[]
$pubDate[]
$guid[]
should be populated with all of the data that you need!
EDIT:
OK so another approach:
<?php
$xmlString = file_get_contents("URL TO RSS/XML FILE");
$titles = preg_filter("/<title>([.]*)</title>/","\\1", mixed $xmlString);
$descriptions = preg_filter("/<description>([.]*)</description>/","\\1", mixed $xmlString);
$links = preg_filter("/<link>([.]*)</link>/","\\1", mixed $xmlString);
$pubDates = preg_filter("/<pubDate>([.]*)</pubDate>/","\\1", mixed $xmlString);
$guids = preg_filter("/<guid>([.]*)</guid>/","\\1", mixed $xmlString);
?>
In this example each variable will be filled with the correct values.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php xml dom extract data from non-standard xml - php

simply I added this: $xml="<xml>".$xml."</xml>"; and now this works $xpath->query("//xml/specs/my"); as well as $xpath->query("//xml/books/item");

Related

PHP - Replace child node in XML

PHP get img src from xml

PHP XML parsing going directly to value by attribute

Parsing XML file

Transform RSS-Feed into another "standard" XML-Format with PHP

Categories

Resources