I've noticed that when attempting to call a DOMNode's insertBefore method where the node to-be-inserted is from another document (i.e. different from the reference node and node being inserted into), the PHP run time generates a DOMException where the message is 'No Modification Allowed Error'.
Documentation seems to be sparse on this issue although I did see some mention of the node being inserted into is read only.
The workaround that I've found works is to clone the node that is from a different document and insert the clone. Example:
foreach($nodeChildren as $child) {
$clone = $child->cloneNode(true);
$parentNode->insertBefore($clone, $nodeToInsertInFrontOf);
}
My question is twofold:
1) Why do I have to clone this node in order to perform an insert?
2) Is this the most efficient way of performing this action (assuming that the cloned child node may contain several children and several levels of hierarchy deep of grandchildren)?
By definition, objects inside a DOM only know objects inside it's own document. It is a security thing.
Related
I'm using the following code to create a DOMDocument and validate it against an external xsd file.
<?php
$xmlPath = "/xml/some/file.xml";
$xsdPath = "/xsd/some/schema.xsd";
$doc = new \DOMDocument();
$doc->loadXML(file_get_contents($xmlPath), LIBXML_NOBLANKS);
if (!$doc>schemaValidate($xsdPath)) {
throw new InvalidXmlFileException();
}
Update 2 (rewritten question)
This works fine, meaning that if the XML doesn't match the definitions of XSD it will throw a meaningful exception.
Now, I want to retrieve information from the DOMDocument using Xpath. It works fine aswell, however, from this point on the DOMDocument is completely detached from the XSD! For example, if I have a DOMNode I cannot know whether it is of type simpleType or type complexType. I can check whether the node has child (hasChild()) nodes, but this is not the same. Also, there is tons of information more in the XSD (like, min and max number of occurrence, etc).
The question really is, do I have to query the XSD myself or is there a programmatic way of asking those kind of questions. I.e. is this DOMNode a complex or simple type?
In another post it was suggested "to process the schema using a real schema processor, and then use its API to ask questions about the contents of the schema". Does XPath has an API to retrieve information of the XSD or is there a different convenient way with DOMDocument?
For the record, the original question
Now, I wanted to proceed to parse information from the DOMDocument using XPath. To increase the integrity of my data I'm storing to a database and giving meaningful error message to the client I wanted to constantly use the schema information to validate the queries. I.e. I wanted to validate fetched childNodes against allowed child nodes defined in the xsd. I wanted to that by using XPath on the xsd document.
However, I sumbled across this post. It basically sais it is a kind of kirky way to that yourself and you should rather use a real schema processor and use its API to make the queries. If I understand that right, I'm using a real schema processor with schemaValidate, but what is meant by using its API?
I kind of guessed already I'm not using the schema in a correct way, but I have no idea how to research a proper usage.
The question
If I use schemaValidate on the DOMDocument is that a one-time validation (true or false) or is it tied to the DOMDocument for longer then? Precisely, can I use the validation also for adding nodes somehow or can I use it to select nodes I'm interested in as suggested by the referenced SO post?
Update
The question was rated unclear, so I want to try again. Say I would like to add a node or edit a node value. Can I use the schema provided in the xsd so that I can validate the user input? Originally, in order to do that I wanted to query the xsd manually with another XPath instance to get the specs for a certain node. But as suggested in the linked article this is not best practice. So the question would be, does the DOM lib offer any API to make such a validation?
Maybe I'm overthinking it. Maybe I just add the node and run the validation again and see where/why it breaks? In that case, the answer of the custom error handling would be correct. Can you confirm?
Your question is not very clear, but it sounds like you want to get detailed reporting about any schema validation failures. While DomDocument::validateSchema() only returns a boolean, you can use internal libxml functions to get some more detailed information.
We can start with your original code, only changing one thing at the top:
<?php
// without this, errors are echoed directly to screen and/or log
libxml_use_internal_errors(true);
$xmlPath = "file.xml";
$xsdPath = "schema.xsd";
$doc = new \DOMDocument();
$doc->loadXML(file_get_contents($xmlPath), LIBXML_NOBLANKS);
if (!$doc->schemaValidate($xsdPath)) {
throw new InvalidXmlFileException();
}
And then we can make the interesting stuff happen in the exception which is presumably (based on the code you've provided) caught somewhere higher up in the code.
<?php
class InvalidXmlFileException extends \Exception
{
private $errors = [];
public function __construct()
{
foreach (libxml_get_errors() as $err) {
$this->errors[] = self::formatXmlError($err);
}
libxml_clear_errors();
}
/**
* Return an array of error messages
*
* #return array
*/
public function getXmlErrors(): array
{
return $this->errors;
}
/**
* Return a human-readable error message from a libxml error object
*
* #return string
*/
private static function formatXmlError(\LibXMLError $error): string
{
$return = "";
switch ($error->level) {
case \LIBXML_ERR_WARNING:
$return .= "Warning $error->code: ";
break;
case \LIBXML_ERR_ERROR:
$return .= "Error $error->code: ";
break;
case \LIBXML_ERR_FATAL:
$return .= "Fatal Error $error->code: ";
break;
}
$return .= trim($error->message) .
"\n Line: $error->line" .
"\n Column: $error->column";
if ($error->file) {
$return .= "\n File: $error->file";
}
return $return;
}
}
So now when you catch your exception you can just iterate over $e->getXmlErrors():
try {
// do stuff
} catch (InvalidXmlFileException $e) {
foreach ($e->getXmlErrors() as $err) {
echo "$err\n";
}
}
For the formatXmlError function I just copied an example from the PHP documentation that parses the error into something human readable, but no reason you couldn't return some structured data or whatever you like.
I think what you're looking for is the PSVI (post schema validation infoset), see this answer for some references.
An other option would be to use XPath2 that has operators to check schema types.
I don't know if there are libraries in PHP that allows you to get PSVI or perform XPath2 queries, in Java there is Xerces for PSVI and Saxon for XPath2
For example With Xerces is possible to cast a DOM Element to a Xerces ElementPSVI in order to get schema informations of an Element.
I can warn that using XPath on the schema (as you were doing) will work only for simple cases since the XML of the schema is very different from the actual schema model (assembled schema) that is a graph of components with properties that are yes calculated from the XML declaration (schema file) but with very complex rules that are almost impossible to recreate with XPath.
So you need at least the PSVI or to make XPath2 queries but, in my experience, obtaining decent validation for application users from an XML schema is difficult.
What are you trying to achieve ?
I have a strange problem with \Doctrine\ORM\UnitOfWork::getScheduledEntityDeletions used inside onFlush event
foreach ($unitOfWork->getScheduledEntityDeletions() as $entity) {
if ($entity instanceof PollVote) {
$arr = $entity->getAnswer()->getVotes()->toArray();
dump($arr);
dump($entity);
dump(in_array($entity, $arr, true));
dump(in_array($entity, $arr));
}
}
And here is the result:
So we see that the object is pointing to a different instance than the original, therefore in_array no longer yields expected results when used with stick comparison (AKA ===). Furthermore, the \DateTime object is pointing to a different instance.
The only possible explanation I found is the following (source):
Whenever you fetch an object from the database Doctrine will keep a copy of all the properties and associations inside the UnitOfWork. Because variables in the PHP language are subject to “copy-on-write” the memory usage of a PHP request that only reads objects from the database is the same as if Doctrine did not keep this variable copy. Only if you start changing variables PHP will create new variables internally that consume new memory.
However, I did not change anything (even the created field is kept as it is). The only operations that were preformed on entity are:
\Doctrine\ORM\EntityRepository::findBy (fetching from DB)
\Doctrine\Common\Persistence\ObjectManager::remove (scheduling for removal)
$em->flush(); (triggering synchronization with DB)
Which leads me to think (I might be wrong) that the Doctrine's change tracking method has nothing to do with the issue that I'm experiencing. Which leads me to following questions:
What causes this?
How to reliably check if an entity scheduled for deletion is inside a collection (\Doctrine\Common\Collections\Collection::contains uses in_array with strict comparison) or which items in a collection are scheduled for deletion?
The problem is that when you tell doctrine to remove entity, it is removed from identity map (here):
<?php
public function scheduleForDelete($entity)
{
$oid = spl_object_hash($entity);
// ....
$this->removeFromIdentityMap($entity);
// ...
if ( ! isset($this->entityDeletions[$oid])) {
$this->entityDeletions[$oid] = $entity;
$this->entityStates[$oid] = self::STATE_REMOVED;
}
}
And when you do $entity->getAnswer()->getVotes(), it does the following:
Load all votes from database
For every vote, checks if it is in identity map, use old one
If it is not in identity map, create new object
Try to call $entity->getAnswer()->getVotes() before you delete entity. If the problem disappears, then I am right. Of cause, I would not suggest this hack as a solution, just to make sure we understand what is going on under the hood.
UPD instead of $entity->getAnswer()->getVotes() you should probably do foreach for all votes, because of lazy loading. If you just call $entity->getAnswer()->getVotes(), Doctrine probably wouldn't do anytning, and will load them only when you start to iterate through them.
From the doc:
If you call the EntityManager and ask for an entity with a specific ID twice, it will return the same instance
So calling twice findOneBy(['id' => 12]) should result in two exact same instances.
So it all depends on how both instances are retrieved by Doctrine.
In my opinion, the one you get in $arr is from a One-to-Many association on $votes in the Answer entity, which results in a separate query (maybe a id IN (12)) by the ORM.
Something you could try is to declare this association as EAGER (fetch="EAGER"), it may force the ORM to make a specific query and keep it in cache so that the second time you want to get it, the same instance is returned ?
Could you have a look at the logs and post them here ? It may indicates something interesting or at least relevant to investigate further.
within a huge dataset I sometimes get inconsistencies when one document is deleted. Symfony2 App with Doctrine ODM and FosREST
$a = new Element();
$b = new Element();
$c = new List();
$c->addElement($a);
$c->addElement($b);
$em->persist($c);
saving at this point works flawlessly
in 99% of the cases $a and $b are still valid Documents when $c is loaded later.
BUT sometimes either $a or $b is deleted without updating the reference in $c.
-> at this moment the next loading of $c will fail with a \Doctrine\ODM\MongoDB\DocumentNotFoundException
(message is something like: The "MongoDBODMProxies__CG__\App\Model\Element" document with identifier "541417702798711d2900607c" could not be found.)
What is the best approach now to handle this case?
I was thinking about either
catching the Exception and to check if the reference it tried to load was on the Element Model
custom exception Handler in fosRest to check for
custom repository function in the mapping and to check there if everything is still valid (+ to store somehow that there is a missing Element) -> but this then forces me to check on every occasion if the "error" is set
UPDATE: The Mapping between the Documents is a bit more complex than I described here
for one the element is basically a collection separated by a discriminator, where only one type of fields references another document (I call it "Tree" now)
a tree can be used in thousands of ElementTree's (that specific type that contains a Tree)
sometimes Tree's can be deleted (this is already a slow running process since a lot of data needs to be updated then)
I would now need to find out what Lists need to change and basically reject the api calls to those lists with the information that a specific element is no longer available.
A few things to check especially for MongoDB:
Make sure that there are no circular references (for example if you have the property $elements on the class List and references set to true on it, make sure List is not referenced on the Elements class as well) and your mappings are consistent.
In the addElement function IF the reference is held on the Element class make sure you also call $element->setList($this) inside the function. (and the same for removeElement, unset the reference if neccessary)
Make sure you cascade all the necessary operations. (For example cascade : ["persist", "delete", "refresh" or "all" ]
You can check your mappings with
$ app/console doctrine:mongodb:mapping:info
Finally if you expect that document to be deleted but you get an error from the proxy object you can clear the metadata cache
$ app/console doctrine:mongodb:cache:clear-metadata
Inperfect Solution that works for now
I now chose to throw a new Exception (it is important not let doctrine throw one because it will reject then any persist attempts in the same request).
In the PostLoad LifecycleEvent I check now the following (simplified):
if ($document instanceof List) {
foreach ($document->getElements() as $element) {
// at this moment $element->getId() is already defined but not yet loaded from mongo
$result = $this->elementRepository->findBy(array(‘_id’ => $element->getId()));
if (sizeof($result)==0) {
throw new InvalidElementInList($element->getId());
}
}
}
in the RestController this enables me now to catch this specific exception and to remove the invalid element from the list + to return a custom view to the user indicating that the element was removed.
I created an xml file that stores some information for me. Now I want to get elements that meet some conditions.
At the moment this looks like this:
Function getElements($xmlObject, $name){
foreach($xmlObject->feature as $feature){
if(stristr($feature->path, $name))){
array_push($aSubFeatures, $feature);
}
}
return $obj;
}
But I'd prefer getting an object as a return value. I used simpleXML for getting the xml file as an object.
I also tried using DOM (creating new DOMDocument and tried to append the gotten feature element objects) but without reasonable result.
Would deleting all not matching parts of the xml a solution? Did not found a way to delete special elements...
Thanks for your help
For appending an element of a current existing DOMDocument into a new DOMDocument you have to call $newdom->importNode($nodeInOldDOM). You cannot do a regular appendChild of a node from another document.
About a year ago I wrote a jQuery-inspired library which allowed you to manipulate the DOM using PHP's XPath and DOMDocument. I recently wanted to clean it up and post it as an open source project. I've been spending the past few days making improvements and implementing some more of PHP's native OO features.
Anyhow, I thought I'd add a new method which allows you to merge a separate XML document with the current one. The catch here is that this method asks for 2 XPath expressions. The first one fetches the elements you want to merge into the existing document. The second specifies the destination path of these merged elements.
The method works well in fetching matching elements from both paths, but I'm having issues with importing the foreign elements into the current DOM. I keep getting the dreaded 'Wrong Document Error' message.
I thought I knew what I was doing, but I suppose I was wrong. If you look at the following code, you can see that I'm first iteration through the current documents matching elements, then through the foreign document's matching elements.
Within the second nested loop is where I am attempting to merge each foreign element into the destination path in the current document.
Not sure what I'm doing wrong here as I'm clearly importing the foreign node into the current document before appending it.
public function merge($source, $path_origin, $path_destination)
{
$Dom = new self;
if(false == $Dom->loadXml($source))
{
throw new DOMException('XML source could not be loaded into the DOM.');
}
$XPath = new DOMXPath($Dom);
foreach($this->path($path_destination, true) as $Destination)
{
if(false == in_array($Destination->nodeName, array('#text', '#document')))
{
foreach($XPath->query($path_origin) as $Origin)
{
if(false == in_array($Destination->nodeName, array('#text', '#document')))
{
$this->importNode($Origin, true);
$Destination->appendChild($Origin->cloneNode(true));
}
}
}
}
return $this;
}
You can find the library in its entirety in the following Github repo:
http://github.com/wilhelm-murdoch/DomQuery
Halps!!!
importNode doesn't "change" the node so it belongs to another document. It creates a new node belonging to the new document and returns it. So you should be getting its return value and using that in appendChild.
$Destination->appendChild($this->importNode($Origin, true));