I'm trying to modify an XML document which contains some node that I can identify by name. For example, I might want to modify the <abc>some text</abc> node in a document (which I can identify by the tag name abc)
The problem I'm facing currently is that I don't know the exact structure of this document. I don't know what the root element is called and I don't know which children might contain this <abc> node.
I tried using SimpleXML<...> but this does not allow me to read arbitrary element children:
$xml = new SimpleXMLElement($xmlString);
foreach ($xml->children() as $child) {
// code here doesnt execute
}
I'm considering building my own XML parser which would have this simple functionality, but I cannot believe that simply iterating over all child nodes of a node (eventually recursively) is not something that is supported by PHP. Hopefully someone can tell me what I'm missing. Thanks!
Use DOMDocument
$dom = new DOMDocument();
#$dom->loadXML($xmlString);
foreach($dom->getElementsByTagName('item') as $item) {
if ($item->hasChildNodes()) {
foreach($item->childNodes as $i) {
YOUR CODE HERE
}
}
}
I found the solution moments after posting, after being stuck on it for a while..
SimpleXML<...> does not have these features, but the DOMDocument and associated classes do;
$dom = new DOMDocument();
$dom->loadXml($xmlString);
foreach($dom->childNodes as $child) {
if ($child->nodeName == "abc") {
$child->textContent = "modified text content";
}
}
Documentation for future reference, here: http://php.net/manual/en/book.dom.php
Thanks for your help.
Related
This question already has answers here:
How to use XMLReader in PHP?
(7 answers)
Closed 6 years ago.
PHP developers here ??
I have a PHP function who parse an xml file (using DOMDocument, i'm proficien with this tool). I want to do the same with XMLReader, but i don't understand how XMLReader works...
I want to use XMLReader because it's a light tool.
Feel free to ask me others questions about my issue.
function getDatas($filepath)
{
$doc = new DOMDocument();
$xmlfile = file_get_contents($filepath);
$doc->loadXML($xmlfile);
$xmlcars = $doc->getElementsByTagName('car');
$mycars= [];
foreach ($xmlcars as $xmlcar) {
$car = new Car();
$car->setName(
$xmlcar->getElementsByTagName('id')->item(0)->nodeValue
);
$car->setBrand(
$xmlcar->getElementsByTagName('brand')->item(0)->nodeValue
);
array_push($mycars, $car);
}
return $mycars;
}
PS : I'm not a senior PHP dev.
Ahah Thanks.
This is a good example from this topic, I hope it helps you to understand.
$z = new XMLReader;
$z->open('data.xml');
$doc = new DOMDocument;
// move to the first <product /> node
while ($z->read() && $z->name !== 'product');
// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
// either one should work
//$node = new SimpleXMLElement($z->readOuterXML());
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
// now you can use $node without going insane about parsing
var_dump($node->element_1);
// go to next <product />
$z->next('product');
}
XMLReader does not, as far as I can tell, have some equivalent way of filtering by an element name. So the closest equivalent to this would be, as mentioned in rvbarreto's answer, to iterate through all elements using XMLReader->read() and grabbing the info you need when the element name matches what you are wanting.'
Alternatively, you might want to check out SimpleXML, which supports filtering using XPath expressions, as well as seeking to a node in the XML using the element structure like they are sub-objects of the main object. For instance, instead of using:
$xmlcar->getElementsByTagName('id')->item(0)->nodeValue;
You would use:
$xmlcar->id[0];
Assuming all of your car elements are at the first level of the XML document tree, the following should work as an example:
function getDatas($filepath) {
$carsData = new SimpleXMLElement($filepath, NULL, TRUE);
$mycars = [];
foreach($carsData->car as $xmlcar) {
$car = new Car();
$car->setName($xmlcar->id[0]);
$car->setBrand($xmlcar->id[0]);
$mycars[] = $car;
}
}
I am currently learning different ways to iterate through the xml document tags using the
php DOMDocument object, I understand the foreach loop for iterating through the tags, but the $element->item(0)->childNodes->item(0)->nodeValue is a bit unclear to me could somebody explain to me in detail? Thank you.
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load('StudentData.xml');
$studentRoot = $xmlDoc->getElementsByTagName('Student');
for ($i = 0; $i < ($studentRoot->length); $i++) {
$firstNameTags = $studentRoot->item($i)->getElementsByTagName('FirstName');
echo $firstNameTags->item(0)->childNodes->item(0)->nodeValue.' <br />';
}
/* so much easier and clear to understand! */
foreach($studentRoot as $node) {
/* For every <student> Tag as a separate node,
step into it's child node, and for each child,
echo the text content inside */
foreach($node->childNodes as $child) {
echo $child->textContent.'<br />';
}
}
?>
$elements->item(0)->childNodes->item(0)->nodeValue
First:
$elements
The current elements$ as parsed and referenced. In the code example, that would be:
$firstNameTags = $studentRoot->item($i)->getElementsByTagName('FirstName');
$firstNameTags->...
Next:
->item(0)
Get a reference to the first of the $elements item in the node list. Since this is zero-indexed, ->item(0) would get the first node in the list by index.
->childNodes
Get a list of the child nodes to that first $elements node referenced by ->item(0) above. As there is no (), this is a (read only) property of the DOMNodeList.
->item(0)
Again, get the first node in the list of child nodes by index.
->nodeValue
The value of the node itself.
If the form of the state alone:
$obj->method()->method()->prop
Confuses you, look into method chaining, which is what this uses to put all of those method calls together.
$ Note, you left off the s, but that indicates there's one or more possible by convention. So $element would be zero or one element reference, $elements might be zero, one or more in a collection of $element.
Here is my xml:
<news_item>
<title>TITLE</title>
<content>COTENT.</content>
<date>DATE</date>
<news_item>
I want to get the names of the tags inside of news_item.
Here is what I have so far:
$dom = new DOMDocument();
$dom->load($file_name);
$results = $dom->getElementsByTagName('news_item');
WITHOUT USING other php libraries like simpleXML, can I get the name of all the tag names (not values) of the children tags?
Example solution
title, content, date
I don't know the name of the tags inside of news_item, only the container tag name 'news_item'
Thanks guys!
Try this:
foreach($results as $node)
{
if($node->childNodes->length)
{
foreach($node->childNodes as $child)
{
echo $child->nodeName,', ';
}
}
}
Should work. Using something similar currently, though for html not xml.
$nodelist = $results->getElementsByTagName('*');
for( $i=0; $i < $nodelist->length; $i++)
echo $nodelist->item($i)->nodeName;
[Previous incorrect answer redacted]
For what it's worth though, there's no cost to using simplexml_import_dom() to make a SimpleXMLElement out of a DOMElement. Both are just object interfaces into an underlying libxml2 data structure. You can even make a change to the DOMElement and see it reflected in the SimpleXMLElement or vice versa. So it doesn't have to be an either/or choice.
I'm just getting started with using php DOMDocument and am having a little trouble.
How would I select all link nodes under a specific node lets say
in jquery i could simply do.. $('h5 > a')
and this would give me all the links under h5.
how would i do this in php using DOMDocument methods?
I tried using phpquery but for some reason it can't read the html page i'm trying to parse.
As far as I know, jQuery rewrites the selector queries to XPath. Any node jQuery can select, XPath also can.
h5 > a means select any a node for which the direct parent node is h5. This can easily be translated to a XPath query: //h5/a.
So, using DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//h5/a');
foreach ($nodes as $node) {
// do stuff
}
Retrieve the DOMElement whose children you are interested in and call DOMElement::getElementsByTagName on it.
Get all h5 tags from it, and loop through each one, checking if it's parent is an a tag.
// ...
$h5s = $document->getElementsByTagName('h5');
$correct_tags = array();
foreach ($h5s as $h5) {
if ($h5->parentNode->tagName == 'a') {
$correct_tags[] = $h5;
}
}
// do something with $correct_tags
I'm trying to parse an HTML snippet, using the PHP DOM functions. I have stripped out everything apart from paragraph, span and line break tags, and now I want to retrieve all the text, along with its accompanying styles.
So, I'd like to get each piece of text, one by one, and for each one I can then go back up the tree to get the values of particular attributes (I'm only interested in some specific ones, like color etc.).
How can I do this? Or am I thinking about it the wrong way?
Suppose you have a DOMDocument here:
$doc = new DOMDocument();
$doc->loadHTMLFile('http://stackoverflow.com/');
You can find all text nodes using a simple Xpath.
$xpath = new DOMXpath($doc);
$textNodes = $xpath->query('//text()');
Just foreach over it to iterate over all textnodes:
foreach ($textNodes as $textNode) {
echo $textNode->data . "\n";
}
From that, you can go up the DOM tree by using ->parentNode.
Hope that this can give you a good start.
For those who are more comfortable with CSS3 selectors, and are willing to include a single extra PHP class into their project, I would suggest the use of Simple PHP DOM parser. The solution would look something like the following:
$html = file_get_html('http://www.example.com/');
$ret = $html->find('p, span');
$store = array();
foreach($ret as $element) {
$store[] = array($element->tag => array('text' => $element->innertext,
'color' => $element->color,
'style' => $element->style));
}
print_r($store);