In PHP, I am trying to validate an XML document using a DTD specified by my application - not by the externally fetched XML document. The validate method in the DOMDocument class seems to only validate using the DTD specified by the XML document itself, so this will not work.
Can this be done, and how, or do I have to translate my DTD to an XML schema so I can use the schemaValidate method?
(this seems to have been asked in Validate XML using a custom DTD in PHP but without correct answer, since the solution only relies on DTD speicified by the target XML)
Note: XML validation could be subject to the Billion Laughs attack, and similar DoS vectors.
This essentially does what rojoca mentioned in his comment:
<?php
$xml = <<<END
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo SYSTEM "foo.dtd">
<foo>
<bar>baz</bar>
</foo>
END;
$root = 'foo';
$old = new DOMDocument;
$old->loadXML($xml);
$creator = new DOMImplementation;
$doctype = $creator->createDocumentType($root, null, 'bar.dtd');
$new = $creator->createDocument(null, null, $doctype);
$new->encoding = "utf-8";
$oldNode = $old->getElementsByTagName($root)->item(0);
$newNode = $new->importNode($oldNode, true);
$new->appendChild($newNode);
$new->validate();
?>
This will validate the document against the bar.dtd.
You can't just call $new->loadXML(), because that would just set the DTD to the original, and the doctype property of a DOMDocument object is read-only, so you have to copy the root node (with everything in it) to a new DOM document.
I only just had a go with this myself, so I'm not entirely sure if this covers everything, but it definitely works for the XML in my example.
Of course, the quick-and-dirty solution would be to first get the XML as a string, search and replace the original DTD by your own DTD and then load it.
I think that's only possible with XSD, see:
http://php.net/manual/en/domdocument.schemavalidate#62032
Related
I am using xsd2php library to parse XSD which describes API request body. Then using the same library (which itself uses jsm-serializer) I try to serialize objects:
$payload = new TrackRequest;
$searchCriteria = new SearchCriteriaAType;
$searchCriteria->addToConsignmentNumber(11111);
$payload->setSearchCriteria($searchCriteria);
$levelOfDetail = new LevelOfDetailAType;
$levelOfDetail->setSummary(true);
$payload->setLevelOfDetail($levelOfDetail);
Using basic serializer settings:
$serializerBuilder = SerializerBuilder::create();
$serializerBuilder->addMetadataDir(__DIR__ . '/../../metadata/Tracking', 'TNTExpressConnect\Tracking\XSD');
$serializerBuilder->setPropertyNamingStrategy(new IdenticalPropertyNamingStrategy);
$serializerBuilder->configureHandlers(function (HandlerRegistryInterface $handler) use ($serializerBuilder) {
$serializerBuilder->addDefaultHandlers();
$handler->registerSubscribingHandler(new BaseTypesHandler()); // XMLSchema List handling
$handler->registerSubscribingHandler(new XmlSchemaDateHandler()); // XMLSchema date handling
});
Serialization results in:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<searchCriteria>
<account/>
<alternativeConsignmentNumber/>
<consignmentNumber>
<entry><![CDATA[11111]]></entry>
</consignmentNumber>
<customerReference/>
<pieceReference/>
</searchCriteria>
<levelOfDetail>
<summary>true</summary>
</levelOfDetail>
</result>
Regarding this results I have several questions:
Why the root element is <result> and not <TrackRequest>?
How to get rid of CDATA?
How to get rid of <entry> tags in favor of creating separate consigmentNumber tag for each entry?
How to replace <summary>true</summary> with self-closing tag <summary/>
I guess for every one of this cases I can create a dedicated handler, but maybe there is a built-in solution, which I overlooked in the documentation (maybe some config options that can be placed in yaml).
And if I have to create handlers maybe someone can point me the more sophisticated example, that explains how to do it right.
I'm not a big fan of annotations, so I would prefer to use separate config files.
Thank you in advance.
You should have a look ar the YAML Reference. A lot of things can be set up with the meta data files.
To change the "result" to "TrackRequest" add this line to the file:
Vendor\MyBundle\Model\ClassName:
xml_root_name: TrackRequest ## Changes the root element
To get rid of cdata in entry change the property:
properties:
entry:
xml_element:
cdata: false ## Add this to disable cdata tags
Just came accross the same problems as you did. I hope it helps.
I try to validate this document in PHP using DOMdocument's schemaValidate:
<?xml version="1.0" encoding="UTF-8"?> <works xmlns="http://pbn.nauka.gov.pl/-/ns/bibliography" pbn-unit-id="1388"><article><title>Mukowiscydoza</title></article></works>
by using $domDocument->schemaValidate('pbn-report.xsd')
Link to XSD:
https://pbn.nauka.gov.pl/help/images/files/pbn-report.xsd.zip
... and I always get an error
Error 1871: Element 'article': This element is not expected. Expected
is one of ( {http://pbn.nauka.gov.pl/-/ns/bibliography}article,
{http://pbn.nauka.gov.pl/-/ns/bibliography}book,
{http://pbn.nauka.gov.pl/-/ns/bibliography}chapter ). on line 0
For me it is incomprehensible. Why do I get an error when I pointed out the default namespace?
Solved.
It turns out that when you create a DOMDocument, when you add an Element every time you need to give Namespace. When generating a document (saveXML) will not make any difference, but if you run schemaValidate, the validator checks DOMDocument object, and not the generated XML.
In other words this code:
$domDocument = new DOMDocument('1.0', "UTF-8");
$domWorks = $domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'works');
$domWorksId = $domDocument->createAttribute('pbn-unit-id');
$domWorksId->value = PBNID;
$domWorks->appendChild($domWorksId);
$domDocument->appendChild($domWorks);
$domArticle = $domDocument->createElement('article');
$domArticle->appendChild($domDocument->createElement('title','Mukowiscydoza'));
$domWorks->appendChild($domArticle);
echo htmlentities($domDocument->saveXML());
generates the same XML as this code
$domDocument = new DOMDocument('1.0', "UTF-8");
$domWorks = $domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'works');
$domWorksId = $domDocument->createAttribute('pbn-unit-id');
$domWorksId->value = PBNID;
$domWorks->appendChild($domWorksId);
$domDocument->appendChild($domWorks);
$domArticle = $domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'article');
$domArticle->appendChild($domDocument->createElementNS("http://pbn.nauka.gov.pl/-/ns/bibliography",'title','Mukowiscydoza'));
$domWorks->appendChild($domArticle);
echo htmlentities($domDocument->saveXML());
But if you check schema
$domDocument->schemaValidate('pbn-report.xsd');
, the first code will return an error.
Strange ...
Strange ...
Well not really. As long as the document is in memory, the information about the namespace(s) with the elements is preserved.
In that case the two different methods / parameter here really make a difference even if you don't see a difference in the generated XML (afterwards):
// null namespace
$domArticle = $domDocument->createElement('article');
// vs. concrete namespace
$domArticle = $domDocument->createElementNS(
'http://pbn.nauka.gov.pl/-/ns/bibliography', 'article'
);
You then serialize the document (what you describe as "generates the same XML") as XML and you then load that XML back into memory. Then the elements with no namespace aren't within the null namespace any longer because they inherit their namespace from their parent element.
So you must differ between the document and it's elements in memory (DOM) and in the serialized form (string, file).
You can have similar effects when you do XSLT transformations. So if you experience something strange, it's worth to consider that the document in memory is not representing what you first think even it creates similar - or even exact same - looking XML ;)
Try to put the xmlns inside the article element , then try again.
xmlns="http://pbn.nauka.gov.pl/-/ns/bibliography"
Firstly can you tell me whether this xml:
<adf:source xsi:schemaLocation="http://www.rightmove.co.uk/adf/rightmoveV4n.xsd rightmoveV4n.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:adf="http://www.rightmove.co.uk/adf/rightmoveV4n.xsd">
</source>
Is correct? I can't see how the document starts with: <adf: source> and closes with </source>, doesn't seem right to me?
I have replicated the structure using my own data but cannot get PHP's XMLWriter() to close the document with just </source> - it closes it with </adf:source>.
I'm doing:
$xml = new XMLWriter();
$xml->openMemory();
$xml->startDocument();
$xml->startElementNS("adf", "source", "http://www.rightmove.co.uk/adf/rightmoveV4n.xsd");
$xml->writeAttributeNS ("xsi", "schemaLocation", "http://www.w3.org/2001/XMLSchema-instance", "http://www.rightmove.co.uk/adf/rightmoveV4n.xsd rightmoveV4n.xsd");
and then eventually
$xml->endElement ();
echo $xml->outputMemory();
No, your XML is not well-formed. The root node of an XML document must be opened and closed with the same element. As far as an XML parser is concerned, <adf:source> and <source> are entirely different.
The adf: in front of the source element is a so-called namespace prefix, which is like a shorthand way of saying: "This element belongs to the namespace http://www.rightmove.co.uk/adf/rightmoveV4n.xsd".
So, the behaviour of XMLWriter() is to be expected and perfectly fine. On the other hand, an application that produces the XML document you have shown is clearly in error.
Validating any VAST2.0 XML tag
$xsdPath='https://github.com/chrisdinn/vast/blob/master/lib/vast_2.0.1.xsd'
$domdoc= new DOMDocument();
$domdoc->loadHTML($xml_input);
if(!$domdoc->schemaValidate($xsdPath)){/* ... */}
returns nonsense messages like Error 1845: Element 'html': No matching global declaration available for the validation root.
In my opinion, this does not really make sense because both the schema xsd and the vast xml do not contain or require a markup or element with the name .
Trying the same with
$reader = new XMLReader();
$reader->XML($xml_input);
$valid = $reader->setSchema($xsdPath);
$reader->read();
$reader->close();
returns the same error codes.
I checked the xsd twiche. It is the same like on https://github.com/chrisdinn/vast/blob/master/lib/vast_2.0.1.xsd.
Any idea how to fix this?
To load XML you should use loadXML(), not loadHTML():
$xsdPath = 'https://raw.github.com/chrisdinn/vast/master/lib/vast_2.0.1.xsd';
// ^^^
// using raw version
$domdoc= new DOMDocument();
$domdoc->loadXML($xml_input);
// ^^^
// Not loading HTML here
if (!$domdoc->schemaValidate($xsdPath)) {
// ...
}
When I dereference the URI you give for the XSD schema document, I don't get an XSD schema document. I get an HTML document which displays a rendering of the XSD schema document. It makes perfect sense to me for a validator expecting to see an xs:schema element to issue the error message you quote, when instead it sees an HTML element.
You can either find a URI that actually serves the XML document your validator needs, or you can make a local copy and point to that local copy. But expecting PHP's schema validation to find the XSD document buried in that HTML is asking more than you can reasonably expect.
The error is rather straight forward:
Error 1845: Element 'html': No matching global declaration available for the validation root.
Means that the element <html> is not declared in the XSD therefore the document can not be validated with that XSD.
In my opinion, this does not really make sense because both the schema xsd and the vast xml do not contain or require a markup or element with the name.
You're loading a HTML document. Regardless if the string/file contains that html element or not, the DOMDocument does contain it so the validation tries to validate it against the XSD and then fails because the XSD does not have any declaration for it.
Is it possible to use PHP's SimpleXML functions to create an XML object from scratch? Looking through the function list, there's ways to import an existing XML string into an object that you can then manipulate, but if I just want to generate an XML object programmatically from scratch, what's the best way to do that?
I figured out that you can use simplexml_load_string() and pass in the root string that you want, and then you've got an object you can manipulate by adding children... although this seems like kind of a hack, since I have to actually hardcode some XML into the string before it can be loaded.
I've done it using the DOMDocument functions, although it's a little confusing because I'm not sure what the DOM has to do with creating a pure XML document... so maybe it's just badly named :-)
Sure you can. Eg.
<?php
$newsXML = new SimpleXMLElement("<news></news>");
$newsXML->addAttribute('newsPagePrefix', 'value goes here');
$newsIntro = $newsXML->addChild('content');
$newsIntro->addAttribute('type', 'latest');
Header('Content-type: text/xml');
echo $newsXML->asXML();
?>
Output
<?xml version="1.0"?>
<news newsPagePrefix="value goes here">
<content type="latest"/>
</news>
Have fun.
In PHP5, you should use the Document Object Model class instead.
Example:
$domDoc = new DOMDocument;
$rootElt = $domDoc->createElement('root');
$rootNode = $domDoc->appendChild($rootElt);
$subElt = $domDoc->createElement('foo');
$attr = $domDoc->createAttribute('ah');
$attrVal = $domDoc->createTextNode('OK');
$attr->appendChild($attrVal);
$subElt->appendChild($attr);
$subNode = $rootNode->appendChild($subElt);
$textNode = $domDoc->createTextNode('Wow, it works!');
$subNode->appendChild($textNode);
echo htmlentities($domDoc->saveXML());
Please see my answer here. As dreamwerx.myopenid.com points out, it is possible to do this with SimpleXML, but the DOM extension would be the better and more flexible way. Additionally there is a third way: using XMLWriter. It's much more simple to use than the DOM and therefore it's my preferred way of writing XML documents from scratch.
$w=new XMLWriter();
$w->openMemory();
$w->startDocument('1.0','UTF-8');
$w->startElement("root");
$w->writeAttribute("ah", "OK");
$w->text('Wow, it works!');
$w->endElement();
echo htmlentities($w->outputMemory(true));
By the way: DOM stands for Document Object Model; this is the standardized API into XML documents.