Here is my xml:
<news_item>
<title>TITLE</title>
<content>COTENT.</content>
<date>DATE</date>
<news_item>
I want to get the names of the tags inside of news_item.
Here is what I have so far:
$dom = new DOMDocument();
$dom->load($file_name);
$results = $dom->getElementsByTagName('news_item');
WITHOUT USING other php libraries like simpleXML, can I get the name of all the tag names (not values) of the children tags?
Example solution
title, content, date
I don't know the name of the tags inside of news_item, only the container tag name 'news_item'
Thanks guys!
Try this:
foreach($results as $node)
{
if($node->childNodes->length)
{
foreach($node->childNodes as $child)
{
echo $child->nodeName,', ';
}
}
}
Should work. Using something similar currently, though for html not xml.
$nodelist = $results->getElementsByTagName('*');
for( $i=0; $i < $nodelist->length; $i++)
echo $nodelist->item($i)->nodeName;
[Previous incorrect answer redacted]
For what it's worth though, there's no cost to using simplexml_import_dom() to make a SimpleXMLElement out of a DOMElement. Both are just object interfaces into an underlying libxml2 data structure. You can even make a change to the DOMElement and see it reflected in the SimpleXMLElement or vice versa. So it doesn't have to be an either/or choice.
Related
I'm trying to modify an XML document which contains some node that I can identify by name. For example, I might want to modify the <abc>some text</abc> node in a document (which I can identify by the tag name abc)
The problem I'm facing currently is that I don't know the exact structure of this document. I don't know what the root element is called and I don't know which children might contain this <abc> node.
I tried using SimpleXML<...> but this does not allow me to read arbitrary element children:
$xml = new SimpleXMLElement($xmlString);
foreach ($xml->children() as $child) {
// code here doesnt execute
}
I'm considering building my own XML parser which would have this simple functionality, but I cannot believe that simply iterating over all child nodes of a node (eventually recursively) is not something that is supported by PHP. Hopefully someone can tell me what I'm missing. Thanks!
Use DOMDocument
$dom = new DOMDocument();
#$dom->loadXML($xmlString);
foreach($dom->getElementsByTagName('item') as $item) {
if ($item->hasChildNodes()) {
foreach($item->childNodes as $i) {
YOUR CODE HERE
}
}
}
I found the solution moments after posting, after being stuck on it for a while..
SimpleXML<...> does not have these features, but the DOMDocument and associated classes do;
$dom = new DOMDocument();
$dom->loadXml($xmlString);
foreach($dom->childNodes as $child) {
if ($child->nodeName == "abc") {
$child->textContent = "modified text content";
}
}
Documentation for future reference, here: http://php.net/manual/en/book.dom.php
Thanks for your help.
I have a page in php where I have to parse an xml.
I have done this for example:
$hotelNodes = $xml_data->getElementsByTagName('Hotel');
foreach($hotelNodes as $hotel){
$supplementsNodes2 = $hotel->getElementsByTagName('BoardBase');
foreach($supplementsNodes2 as $suppl2) {
echo'<p>HERE</p>'; //not enter here
}
}
}
In this code I access to each hotel of my xml, and foreach hotel I would like to search the tag BoardBase but it doesn0t enter inside it.
This is my xml (cutted of many parts!!!!!)
<hotel desc="DESC" name="Hotel">
<selctedsupplements>
<boardbases>
<boardbase bbpublishprice="0" bbprice="0" bbname="Colazione Continentale" bbid="1"></boardbase>
</boardbases>
</selctedsupplements>
</occupancy></occupancies>
</hotel>
I have many nodes that doesn't have BoardBase but sometimes there is but not enter.
Is possible that this node isn't accessible?
This xml is received by a server with a SoapClient.
If I inspect the XML printed in firebug I can see the node with opacity like this:
I have also tried this:
$supplementsNodes2 = $hotel->getElementsByTagName('boardbase');
but without success
2 issues I can see from the get-go: XML names are case-sensitive, hence:
$hotelNodes = $xml_data->getElementsByTagName('Hotel');
Can't work, because your xml node looks like:
<hotel desc="DESC" name="Hotel">
hotel => lower-case!
As you can see here:
[...] names for such things as elements, while XML is explicitly case sensitive.
The official specs specify tag names as case-sensitive, so getElementsByTagName('FOO') won't return the same elements as getElementsByTagName('foo')...
Secondly, you seem to have some tag-soup going on:
</occupancy></occupancies>
<!-- tag names don't match, both are closing tags -->
This is just plain invalid markup, it should read either:
<occupancy></occupancy>
or
<occupancies></occupancies>
That would be the first 2 ports of call.
I've set up a quick codepad using this code, which you can see here:
$xml = '<hotel desc="DESC" name="Hotel">
<selctedsupplements>
<boardbases>
<boardbase bbpublishprice="0" bbprice="0" bbname="Colazione Continentale" bbid="1"></boardbase>
</boardbases>
</selctedsupplements>
<occupancy></occupancy>
</hotel>';
$dom = new DOMDocument;
$dom->loadXML($xml);
$badList = $dom->getElementsByTagName('Hotel');
$correctList = $dom->getElementsByTagName('hotel');
echo sprintf("%d",$badList->lenght),
' compared to ',
$correctList->length, PHP_EOL;
The output was "0 compared to 1", meaning that using a lower-case selector returned 1 element, the one with the upper-case H returned an empty list.
To get to the boardbase tags for each hotel tag, you just have to write this:
$hotels = $dom->getElementsByTagName('html');
foreach($hotels as $hotel)
{
$supplementsNodes2 = $hotel->getElementsByTagName('boardbase');
foreach($supplementsNodes2 as $node)
{
var_dump($node);//you _will_ get here now
}
}
As you can see on this updated codepad.
Alessandro, your XML is a mess (=un casino), you really need to get that straight. Elias' answer pointed out some very basic stuff to consider.
I built on the code pad Elias has been setting up, it is working perfectly with me:
$dom = new DOMDocument;
$dom->loadXML($xml);
$hotels = $dom->getElementsByTagName('hotel');
foreach ($hotels as $hotel) {
$bbs = $hotel->getElementsByTagName('boardbase');
foreach ($bbs as $bb) echo $bb->getAttribute('bbname');
}
see http://codepad.org/I6oxkEOC
I'm using SimpleXML & PHP to parse an XML element in the following form:
<element>
random text with <inlinetag src="http://url.com/">inline</inlinetag> XML to parse
</element>
I know I can reach inlinetag using $element->inlinetag, but I don't know how to reach it in such a way that I can basically replace the inlinetag with a link to the attribute source without using it's location in the text. The result would basically have to look like this:
here is a random text with inline XML
This may be a stupid questions, I hope someone here can help! :)
I found a way to do this using DOMElement.
One way to replace the element is by cloning it with a different name/attributes. Here is is a way to do this, using the accepted answer given on How do you rename a tag in SimpleXML through a DOM object?
function clonishNode(DOMNode $oldNode, $newName, $replaceAttrs = [])
{
$newNode = $oldNode->ownerDocument->createElement($newName);
foreach ($oldNode->attributes as $attr)
{
if (isset($replaceAttrs[$attr->name]))
$newNode->setAttribute($replaceAttrs[$attr->name], $attr->value);
else
$newNode->appendChild($attr->cloneNode());
}
foreach ($oldNode->childNodes as $child)
$newNode->appendChild($child->cloneNode(true));
$oldNode->parentNode->replaceChild($newNode, $oldNode);
}
Now, we use this function to clone the inline element with a new element and attribute name. Here comes the tricky part: iterating over all the nodes will not work as expected. The length of the selected nodes will change as you clone them, as the original node is removed. Therefore, we only select the first element until there are no elements left to clone.
$xml = '<element>
random text with <inlinetag src="http://url.com/">inline</inlinetag> XML to parse
</element>';
$dom = new DOMDocument;
$dom->loadXML($xml);
$nodes= $dom->getElementsByTagName('inlinetag');
echo $dom->saveXML(); //<element>random text with <inlinetag src="http://url.com/">inline</inlinetag> XML to parse</element>
while($nodes->length > 0) {
clonishNode($nodes->item(0), 'a', ['src' => 'href']);
}
echo $dom->saveXML(); //<element>random text with inline XML to parse</element>
That's it! All that's left to do is getting the content of the element tag.
Is this the result you want to achieve?
<?php
$data = '<element>
random text with
<inlinetag src="http://url.com/">inline
</inlinetag> XML to parse
</element>';
$xml = simplexml_load_string($data);
foreach($xml->inlinetag as $resource)
{
echo 'Your SRC attribute = '. $resource->attributes()->src; // e.g. name, price, symbol
}
?>
I want to extract the content of body of a html page along with the tagNames of its child. I have taken an example html like this:
<html>
<head></head>
<body>
<h1>This is H1 tag</h1>
<h2>This is H2 tag</h2>
<h3>This is H3 tag</h3>
</body>
</html>
I have implemented the php code like below and its working fine.
$d=new DOMDocument();
$d->loadHTMLFile('file.html');
$l=$d->childNodes->item(1)->childNodes->item(1)->childNodes;
for($i=0;$i<$l->length;$i++)
{
echo "<".$l->item($i)->nodeName.">".$l->item($i)->nodeValue."</".$l->item($i)->nodeName.">";
}
This code is working perfectly fine, but when I tried to do this using foreach loop instead of for loop, the nodeName property was returning '#text' with every actual nodeName.
Here is that code
$l=$d->childNodes->item(1)->childNodes->item(1)->childNodes;
foreach ($l as $li) {
echo $li->childNodes->item(0)->nodeName."<br/>";
}
Why so?
When I've had this problem it was fixed by doing the following.
$xmlDoc = new DOMDocument();
$xmlDoc->preserveWhiteSpace = false; // important!
You can trace out your $node->nodeType to see the difference. I get 3, 1, 3 even though there was only one node (child). Turn white space off and now I just get 1.
GL.
In DOM, everything is a 'node'. Not just the elements (tags); comments and text between the elements (even if it's just whitespaces or newlines, which seems to be the case in your example) are nodes, too. Since text nodes don't have an actual node name, it's substituted with #text to indicate it's a special kind of node.
Apparently, text nodes are left out when manually selecting child nodes with the item method, but included when iterating over the DOMNodeList. I'm not sure why the class behaves like this, someone else will have to answer that.
Beside nodeName and nodeValue, a DOMNode also has a nodeType property. By checking this property against certain constants you can determine the type of the node and thus filter out unwanted nodes.
I'm coming a little late to this but the best solution for me was different. The issue its that the TEXT node doesn't know it's name but his parent do so all you need to know it's ask his parent for the nodeValue to get the key.
$dom = new DOMDocument();
$dom->loadXML($stringXML);
$valorizador = $dom->getElementsByTagName("tagname");
foreach ($valorizador->item(0)->childNodes as $item) {
$childs = $item->childNodes;
$key = $item->nodeName;
foreach ($childs as $i) {
echo $key." => ".$i->nodeValue. "\n";
}
}
I'm using simpleXML to add in a child node within one of my XML documents... when I do a print_r on my simpleXML object, the < is still being displayed as a < in the view source. However, after I save this object back to XML using DOMDocument, the < is converted to < and the > is converted to >
Any ideas on how to change this behavior? I've tried adding dom->substituteEntities = false;, but this did no good.
//Convert SimpleXML element to DOM and save
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = false;
$dom->substituteEntities = false;
$dom->loadXML($xml->asXML());
$dom->save($filename);
Here is where I'm using the <:
$new_hint = '<![CDATA[' . $value[0] . ']]>';
$PrintQuestion->content->multichoice->feedback->hint->Passage->Paragraph->addChild('TextFragment', $new_hint);
The problem, is I'm using simple XML to iterate through certain nodes in the XML document, and if an attribute matches a given ID, a specific child node is added with CDATA. Then after all processsing, I save the XML back to file using DOMDocument, which is where the < is converted to <, etc.
Here is a link to my entire class file, so you can get a better idea on what I'm trying to accomplish. Specifically refer to the hint_insert() method at the bottom.
http://pastie.org/1079562
SimpleXML and php5's DOM module use the same internal representation of the document (facilitated by libxml). You can switch between both apis without having to re-parse the document via simplexml_import_dom() and dom_import_simplexml().
I.e. if you really want/have to perform the iteration with the SimpleXML api once you've found your element you can switch to the DOM api and create the CData section within the same document.
<?php
$doc = new SimpleXMLElement('<a>
<b id="id1">a</b>
<b id="id2">b</b>
<b id="id3">c</b>
</a>');
foreach( $doc->xpath('b[#id="id2"]') as $b ) {
$b = dom_import_simplexml($b);
$cdata = $b->ownerDocument->createCDataSection('0<>1');
$b->appendChild($cdata);
unset($b);
}
echo $doc->asxml();
prints
<?xml version="1.0"?>
<a>
<b id="id1">a</b>
<b id="id2">b<![CDATA[0<>1]]></b>
<b id="id3">c</b>
</a>
The problem is that you're likely adding that as a string, instead of as an element.
So, instead of:
$simple->addChild('foo', '<something/>');
which will be treated as text:
$child = $simple->addChild('foo');
$child->addChild('something');
You can't have a literal < in the body of the XML document unless it's the opening of a tag.
Edit: After what you describe in the comments, I think you're after:
DomDocument::createCDatatSection()
$child = $dom->createCDataSection('your < cdata > body ');
$dom->appendChild($child);
Edit2: After reading your edit, there's only one thing I can say:
You're doing it wrong... You can't add elements as a string value for another element. Sorry, you just can't. That's why it's escaping things, because DOM and SimpleXML are there to make sure you always create valid XML. You need to create the element as an object... So, if you want to create the CDATA child, you'd have to do something like this:
$child = $PrintQuestion.....->addChild('TextFragment');
$domNode = dom_import_simplexml($child);
$cdata = $domNode->ownerDocument->createCDataSection($value[0]);
$domNode->appendChild($cdata);
That's all there should be to it...