I'd like to search for nodes with the same node name in a SimpleXML Object no matter how deep they are nested and create an instance of them as an array.
In the HTML DOM I can do that with JavaScript by using getElementsByTagName(). Is there a way to do that in PHP as well?
Yes use xpath
$xml->xpath('//div');
Here $xml is your SimpleXML object.
In this example you will get array of all 'div' elements
$fname = dirname(__FILE__) . '\\xml\\crRoll.xml';
$dom = new DOMDocument;
$dom->load($fname, LIBXML_DTDLOAD|LIBXML_DTDATTR);
$root = $dom->documentElement;
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('cr', "http://www.w3.org/1999/xhtml");
$candidateNodes = $xpath->query("//cr:break");
foreach ($candidateNodes as $child) {
$max = $child->getAttribute('tstamp');
}
This finds all the BREAK nodes (tstamp attr) using XPath ...
Only on DOMDocument::getElementsByTagName,
however, you can import/export SimpleXML into DOMDocument,
or simply use DOMDocument to parse XML.
Another answer mentioned about Xpath,
it will return duplication of node, if you have something like :-
<div><div>1</div></div>
Related
I'm using this in SimpleXML and PHP:
foreach ($xml->children() as $node) {
echo $node->attributes('namespace')->id;
}
That prints the id attribute of all nodes (using a namespace).
But now I want to know the line number that $node is located in the XML file.
I need the line number, because I'm analyzing the XML file, and returning to the user information of possible issues to resolve them. So I need to say something like: "Here you have an error at line X". I'm sure that the XML file would be in a standard format that will have enough line breaks for this to be useful.
It is possible with DOM. DOMNode provides the function getLineNo().
DOM
$xml = <<<'XML'
<foo>
<bar/>
</foo>
XML;
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
var_dump(
$xpath->evaluate('//bar[1]')->item(0)->getLineNo()
);
Output:
int(2)
SimpleXML
SimpleXML is based on DOM, so you can convert SimpleXMLElement objects to DOMElement objects.
$element = new SimpleXMLElement($xml);
$node = dom_import_simplexml($element->bar);
var_dump($node->getLineNo());
And yes, most of the time if you have a problem with SimpleXML, the answer is to use DOM.
XMLReader
XMLReader has the line numbers internally, but here is no direct method to access them. Again you will have to convert it into a DOMNode. It works because both use libxml2. This will read the node and all its descendants into memory, so be careful with it.
$reader = new XMLReader();
$reader->open('data://text/xml;base64,'.base64_encode($xml));
while ($reader->read()) {
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name== 'bar') {
var_dump($reader->expand()->getLineNo());
}
}
Is there a way to create my own DOMNodeList? E.g.:
$doc = new DOMDocument();
$elem = $doc->createElement('div');
$nodeList = new DOMNodeList;
$nodeList->addItem($elem); // ?
My idea is to extend DOMDocument class adding some useful methods that return data as DOMNodeList.
Is it possible to do it without writing my own version of DOMNodeList class?
You cannot add items to DOMNodeList via it's public interface. However, DOMNodeLists are live collections when connected to a DOM Tree, so adding a Child Element to a DOMElement will add an element in that element's child collection (which is a DOMNodeList):
$doc = new DOMDocument();
$nodelist = $doc->childNodes; // a DOMNodeList
echo $nodelist->length; // 0
$elem = $doc->createElement('div');
$doc->appendChild($elem);
echo $nodelist->length; // 1
You say you want to add "some useful methods that return data as DOMNodeList". In the context of DOMDocument, this is what XPath does. It allows you to query all the nodes in the document and return them in a DOMNodeList. Maybe that's what you are looking for.
I'm using PHP/Zend to load html into a DOM, and then I get a specific div id that I want to modify.
$dom = new Zend_Dom_Query($html);
$element = $dom->query('div[id="someid"]');
How do I modify the text/content/html displayed inside that $element div, and then save the changes to the $dom or $html so I can print the modified html. Any idea how to do this?
Zend_Dom_Query is tailored just for querying a dom, so it doesn't provide an interface in and of itself to alter the dom and save it, but it does expose the PHP Native DOM objects that will let you do so. Something like this should work:
$dom = new Zend_Dom_Query($html);
$document = $dom->getDocument();
$elements = $dom->query('div[id="someid"]');
foreach($elements AS $element) {
//$element is an instance of DOMElement (http://www.php.net/DOMElement)
//You have to create new nodes off the document
$node = $document->createElement("div", "contents of div");
$element->appendChild($node)
}
$newHtml = $document->saveXml();
Take a look at the PHP Doc for DOMElement to get an idea of how you can alter the dom:
http://www.php.net/DOMElement
I have following xml structure:
<stores>
<store>
<name></name>
<address></address>
<custom-attributes>
<custom-attribute attribute-id="country">Deutschland</custom-attribute>
<custom-attribute attribute-id="displayWeb">false</custom-attribute>
</custom-attributes>
</store>
</stores>
how can i get the value of "displayWeb"?
The best solution for this is use PHP DOM, you may either loop trough all stores:
$dom = new DOMDocument();
$dom->loadXML( $yourXML);
// With use of child elements:
$storeNodes = $dom->documentElement->childNodes;
// Or xpath
$xPath = new DOMXPath( $dom);
$storeNodes = $xPath->query( 'store/store');
// Store nodes now contain DOMElements which are equivalent to this array:
// 0 => <store><name></name>....</store>
// 1 => <store><name>Another store not shown in your XML</name>....</store>
Those uses DOMDocument properties and DOMElement attribute childNodes or DOMXPath. Once you have all stores you may iterate trough them with foreach loop and get either all elements and store them into associative array with getElementsByTagName:
foreach( $storeNodes as $node){
// $node should be DOMElement
// of course you can use xPath instead of getAttributesbyTagName, but this is
// more effective
$domAttrs = $node->getAttributesByTagName( 'custom-attribute');
$attributes = array();
foreach( $domAttrs as $domAttr){
$attributes[ $domAttr->getAttribute( 'attribute-id')] = $domAttr->nodeValue;
}
// $attributes = array( 'country' => 'Deutschland', 'displayWeb' => 'false');
}
Or select attribute directly with xPath:
// Inside foreach($storeNodes as $node) loop
$yourAttribute = $xPath->query( "custom-attribute[#attribute-id='displayWeb']", $node)
->item(0)->nodeValue; // Warning will cause fatal error when missing desired tag
Or when you need just one value from whole document you could use (as Kirill Polishchuk suggested):
$yourAttribute = $xPath->query( "stores/store/custom-attributes/custom-attribute[#attribute-id='displayWeb']")
->item(0)->nodeValue; // Warning will cause fatal error when missing desired tag
Carefully study manual to understand what type is returned when and what does which attribute contain.
For example I can parse XML DOM. http://php.net/manual/en/book.dom.php
You can use XPath:
stores/store/custom-attributes/custom-attribute[#attribute-id='displayWeb']
I'd suggest PHP's SimpleXML. That web page has lots of user-supplied examples of use to extract values from the parsed data.
I'm just getting started with using php DOMDocument and am having a little trouble.
How would I select all link nodes under a specific node lets say
in jquery i could simply do.. $('h5 > a')
and this would give me all the links under h5.
how would i do this in php using DOMDocument methods?
I tried using phpquery but for some reason it can't read the html page i'm trying to parse.
As far as I know, jQuery rewrites the selector queries to XPath. Any node jQuery can select, XPath also can.
h5 > a means select any a node for which the direct parent node is h5. This can easily be translated to a XPath query: //h5/a.
So, using DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//h5/a');
foreach ($nodes as $node) {
// do stuff
}
Retrieve the DOMElement whose children you are interested in and call DOMElement::getElementsByTagName on it.
Get all h5 tags from it, and loop through each one, checking if it's parent is an a tag.
// ...
$h5s = $document->getElementsByTagName('h5');
$correct_tags = array();
foreach ($h5s as $h5) {
if ($h5->parentNode->tagName == 'a') {
$correct_tags[] = $h5;
}
}
// do something with $correct_tags