Error when merging two XML documents using XPath & DOMDocument

Error when merging two XML documents using XPath & DOMDocument - php

About a year ago I wrote a jQuery-inspired library which allowed you to manipulate the DOM using PHP's XPath and DOMDocument. I recently wanted to clean it up and post it as an open source project. I've been spending the past few days making improvements and implementing some more of PHP's native OO features.
Anyhow, I thought I'd add a new method which allows you to merge a separate XML document with the current one. The catch here is that this method asks for 2 XPath expressions. The first one fetches the elements you want to merge into the existing document. The second specifies the destination path of these merged elements.
The method works well in fetching matching elements from both paths, but I'm having issues with importing the foreign elements into the current DOM. I keep getting the dreaded 'Wrong Document Error' message.
I thought I knew what I was doing, but I suppose I was wrong. If you look at the following code, you can see that I'm first iteration through the current documents matching elements, then through the foreign document's matching elements.
Within the second nested loop is where I am attempting to merge each foreign element into the destination path in the current document.
Not sure what I'm doing wrong here as I'm clearly importing the foreign node into the current document before appending it.
public function merge($source, $path_origin, $path_destination)
{
$Dom = new self;
if(false == $Dom->loadXml($source))
{
throw new DOMException('XML source could not be loaded into the DOM.');
}
$XPath = new DOMXPath($Dom);
foreach($this->path($path_destination, true) as $Destination)
{
if(false == in_array($Destination->nodeName, array('#text', '#document')))
{
foreach($XPath->query($path_origin) as $Origin)
{
if(false == in_array($Destination->nodeName, array('#text', '#document')))
{
$this->importNode($Origin, true);
$Destination->appendChild($Origin->cloneNode(true));
}
}
}
}
return $this;
}
You can find the library in its entirety in the following Github repo:
http://github.com/wilhelm-murdoch/DomQuery
Halps!!!

importNode doesn't "change" the node so it belongs to another document. It creates a new node belonging to the new document and returns it. So you should be getting its return value and using that in appendChild.
$Destination->appendChild($this->importNode($Origin, true));

Related

Using XSD schema validation for XPath queries

I'm using the following code to create a DOMDocument and validate it against an external xsd file.
<?php
$xmlPath = "/xml/some/file.xml";
$xsdPath = "/xsd/some/schema.xsd";
$doc = new \DOMDocument();
$doc->loadXML(file_get_contents($xmlPath), LIBXML_NOBLANKS);
if (!$doc>schemaValidate($xsdPath)) {
throw new InvalidXmlFileException();
}
Update 2 (rewritten question)
This works fine, meaning that if the XML doesn't match the definitions of XSD it will throw a meaningful exception.
Now, I want to retrieve information from the DOMDocument using Xpath. It works fine aswell, however, from this point on the DOMDocument is completely detached from the XSD! For example, if I have a DOMNode I cannot know whether it is of type simpleType or type complexType. I can check whether the node has child (hasChild()) nodes, but this is not the same. Also, there is tons of information more in the XSD (like, min and max number of occurrence, etc).
The question really is, do I have to query the XSD myself or is there a programmatic way of asking those kind of questions. I.e. is this DOMNode a complex or simple type?
In another post it was suggested "to process the schema using a real schema processor, and then use its API to ask questions about the contents of the schema". Does XPath has an API to retrieve information of the XSD or is there a different convenient way with DOMDocument?
For the record, the original question
Now, I wanted to proceed to parse information from the DOMDocument using XPath. To increase the integrity of my data I'm storing to a database and giving meaningful error message to the client I wanted to constantly use the schema information to validate the queries. I.e. I wanted to validate fetched childNodes against allowed child nodes defined in the xsd. I wanted to that by using XPath on the xsd document.
However, I sumbled across this post. It basically sais it is a kind of kirky way to that yourself and you should rather use a real schema processor and use its API to make the queries. If I understand that right, I'm using a real schema processor with schemaValidate, but what is meant by using its API?
I kind of guessed already I'm not using the schema in a correct way, but I have no idea how to research a proper usage.
The question
If I use schemaValidate on the DOMDocument is that a one-time validation (true or false) or is it tied to the DOMDocument for longer then? Precisely, can I use the validation also for adding nodes somehow or can I use it to select nodes I'm interested in as suggested by the referenced SO post?
Update
The question was rated unclear, so I want to try again. Say I would like to add a node or edit a node value. Can I use the schema provided in the xsd so that I can validate the user input? Originally, in order to do that I wanted to query the xsd manually with another XPath instance to get the specs for a certain node. But as suggested in the linked article this is not best practice. So the question would be, does the DOM lib offer any API to make such a validation?
Maybe I'm overthinking it. Maybe I just add the node and run the validation again and see where/why it breaks? In that case, the answer of the custom error handling would be correct. Can you confirm?

Your question is not very clear, but it sounds like you want to get detailed reporting about any schema validation failures. While DomDocument::validateSchema() only returns a boolean, you can use internal libxml functions to get some more detailed information.
We can start with your original code, only changing one thing at the top:
<?php
// without this, errors are echoed directly to screen and/or log
libxml_use_internal_errors(true);
$xmlPath = "file.xml";
$xsdPath = "schema.xsd";
$doc = new \DOMDocument();
$doc->loadXML(file_get_contents($xmlPath), LIBXML_NOBLANKS);
if (!$doc->schemaValidate($xsdPath)) {
throw new InvalidXmlFileException();
}
And then we can make the interesting stuff happen in the exception which is presumably (based on the code you've provided) caught somewhere higher up in the code.
<?php
class InvalidXmlFileException extends \Exception
{
private $errors = [];
public function __construct()
{
foreach (libxml_get_errors() as $err) {
$this->errors[] = self::formatXmlError($err);
}
libxml_clear_errors();
}
/**
* Return an array of error messages
*
* #return array
*/
public function getXmlErrors(): array
{
return $this->errors;
}
/**
* Return a human-readable error message from a libxml error object
*
* #return string
*/
private static function formatXmlError(\LibXMLError $error): string
{
$return = "";
switch ($error->level) {
case \LIBXML_ERR_WARNING:
$return .= "Warning $error->code: ";
break;
case \LIBXML_ERR_ERROR:
$return .= "Error $error->code: ";
break;
case \LIBXML_ERR_FATAL:
$return .= "Fatal Error $error->code: ";
break;
}
$return .= trim($error->message) .
"\n Line: $error->line" .
"\n Column: $error->column";
if ($error->file) {
$return .= "\n File: $error->file";
}
return $return;
}
}
So now when you catch your exception you can just iterate over $e->getXmlErrors():
try {
// do stuff
} catch (InvalidXmlFileException $e) {
foreach ($e->getXmlErrors() as $err) {
echo "$err\n";
}
}
For the formatXmlError function I just copied an example from the PHP documentation that parses the error into something human readable, but no reason you couldn't return some structured data or whatever you like.

I think what you're looking for is the PSVI (post schema validation infoset), see this answer for some references.
An other option would be to use XPath2 that has operators to check schema types.
I don't know if there are libraries in PHP that allows you to get PSVI or perform XPath2 queries, in Java there is Xerces for PSVI and Saxon for XPath2
For example With Xerces is possible to cast a DOM Element to a Xerces ElementPSVI in order to get schema informations of an Element.
I can warn that using XPath on the schema (as you were doing) will work only for simple cases since the XML of the schema is very different from the actual schema model (assembled schema) that is a graph of components with properties that are yes calculated from the XML declaration (schema file) but with very complex rules that are almost impossible to recreate with XPath.
So you need at least the PSVI or to make XPath2 queries but, in my experience, obtaining decent validation for application users from an XML schema is difficult.
What are you trying to achieve ?

GetElementsByTagName alternative to DOMDocument

I am creating an HTML file with DOMDocument, but I have a problem at the time of the search by the getElementsByTagName method. What I found is that as I'm generating the hot, does not recognize the labels that I inserted.
I tried with DOMXPath, but to no avail :S
For now, I've got to do is go through all the children of a node and store in an array, but I need to convert that score DOMNodeList, and in doing
return (DOMNodeList) $ my_array;
generates a syntax error.
My specific question is, how I can do to make a search for tags with the getElementsByTagName method or other alternative I can offer to achieve the task?
Recalling that the DOMDocument I'm generating at the time.
If you need more information, I'll gladly place it in the question.
Sure Jonathan Sampson.
I apologize for the editing of the question the way. I did not quite understand this forum format.
For a better understanding of what I do, I put the inheritance chain.
I have this base class
abstract class ElementoBase {
...
}
And I have this class that inherits from the previous one, with an abstract function insert (insert)
abstract class Elemento extends ElementoBase {
...
public abstract function insertar ( $elemento );
}
Then I have a whole series of classes that represent the HTML tags that inherit from above, ie.
class A extends Elemento {
}
...
Now the code I use to insert the labels in the paper is as follows:
public function insertar ( $elemento ) {
$this->getElemento ()->appendChild ( $elemento->getElemento () );
}
where the function getElemento (), return a DOMElement
Moreover, before inserting the element do some validations that depend on the HTML tag that is to be inserted,
because they all have very specific specifications.
Since I'm generating HTML code at the same time, it is obvious that there is no HTML file.
To your question, the theory tells me to do this:
$myListTags = $this->getElemento ()->getElementsByTagName ( $tag );
but I always returns null, this so I researched it because I'm not loading the HTML file, because if I
$myHtmlFile = $this->getDocumento ()->loadHTMLFile ( $filename );
$myListTags = $myHtmlFile->getElementsByTagName ( $etiqueta );
I do return the list of HTML tags
If you need more information, I'll gladly place it in the question.

I am assuming you have created a valid HTML file with DOMDocument. Your basic problem is to parse or search the HTML doc for a particular tag name.
To search a HTML file the best solution available in PHP is Simple HTML DOM parser.
You can just run the following code and you are done!
$html = file_get_html('url to your html file');
foreach($html->find('tag name') as $element)
{
// perform the action you want to do here.
// example: echo $element->someproperty;
}

$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->appendChild(
$doc->createElement('Filiberto', 'It works!')
);
$nodeList = $doc->getElementsByTagName('Filiberto');
var_dump($nodeList->item(0)->nodeValue);

saving parts of xml object as object

I created an xml file that stores some information for me. Now I want to get elements that meet some conditions.
At the moment this looks like this:
Function getElements($xmlObject, $name){
foreach($xmlObject->feature as $feature){
if(stristr($feature->path, $name))){
array_push($aSubFeatures, $feature);
}
}
return $obj;
}
But I'd prefer getting an object as a return value. I used simpleXML for getting the xml file as an object.
I also tried using DOM (creating new DOMDocument and tried to append the gotten feature element objects) but without reasonable result.
Would deleting all not matching parts of the xml a solution? Did not found a way to delete special elements...
Thanks for your help

For appending an element of a current existing DOMDocument into a new DOMDocument you have to call $newdom->importNode($nodeInOldDOM). You cannot do a regular appendChild of a node from another document.

PHP DOMNode insertBefore (No Modification Allowed Error)

I've noticed that when attempting to call a DOMNode's insertBefore method where the node to-be-inserted is from another document (i.e. different from the reference node and node being inserted into), the PHP run time generates a DOMException where the message is 'No Modification Allowed Error'.
Documentation seems to be sparse on this issue although I did see some mention of the node being inserted into is read only.
The workaround that I've found works is to clone the node that is from a different document and insert the clone. Example:
foreach($nodeChildren as $child) {
$clone = $child->cloneNode(true);
$parentNode->insertBefore($clone, $nodeToInsertInFrontOf);
}
My question is twofold:
1) Why do I have to clone this node in order to perform an insert?
2) Is this the most efficient way of performing this action (assuming that the cloned child node may contain several children and several levels of hierarchy deep of grandchildren)?

By definition, objects inside a DOM only know objects inside it's own document. It is a security thing.

Using DOMXPath to replace a node while maintaining its position

Ok, so I had this neat little idea the other night to create a helper class for DOMDOCUMENT that mimics, to some extent, jQuery's ability to manipulate the DOM of an HTML or XML-based string. Instead of css selectors, XPath is used. For example:
$Xml->load($source)
->path('//root/items')
->each(function($Context)
{
echo $Context->nodeValue;
});
This would invoke a callback function on every resulting node. Unfortunately, PHP version < 5.3.x doesn't support lambda functions or closures, so I'm forced to do something a bit more like this for the time being:
$Xml->load($source)
->path('//root/items')
->walk('printValue', 'param1', 'param2');
Everything is working great at the moment and I think this project would be useful to a lot of people, but I'm stuck with one of the functions. I am attempting to mimic jQuery's 'replace' method. Using the following code, I can accomplish this quite easily by applying the following method:
$Xml->load($source)
->path('//root/items')
->replace($Xml->createElement('foo', 'bar')); // can be an object, string or XPath pattern
The code behind this method is:
public function replace($Content)
{
foreach($this->results as $Element)
{
$Element->parentNode->appendChild($Content->cloneNode(true));
$Element->parentNode->removeChild($Element);
}
return $this;
}
Now, this works. It replaces every resulting element with a cloned version of $Content. The problem is that it adds them to the bottom of the parent node's list of children. The question is, how do I clone this element to replace other elements, while still retaining the original position in the DOM?
I was thinking about reverse-engineering the node I was to replace. Basically, copying over values, attributes and element name from $Content, but I am unable to change the actual element name of the target element.
Reflection could be a possibility, but there's gotta be an easier way to do this.
Anybody?

Use replaceChild instead of appendChild/removeChild.

Lookup if $element has a nextsibbling prior to removing if so do an insertBefore that next sibling otherwise simply append.
public function replace($Content)
{
foreach($this->results as $Element)
{
if ($Element->nextSibling) {
$NextSiblingReference = $Element->nextSibling;
$Element->parentNode->insertBefore($Content->cloneNode(true),$NextSiblingReference);
}
else {
$Element->parentNode->appendChild($Content->cloneNode(true));
}
$Element->parentNode->removeChild($Element);
}
return $this;
}
Totally untested though.
Or as AnthonyWJones suggested replaceChild , big oomph how did i miss that moment :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.