A strange thing happened after a supplier changed the XML header a bit. I used to be able to read stuff using xpath, but now I can't even get a reply with
$xml->xpath('/');
They changed it from this...
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE NewsML SYSTEM "http://www.newsml.org/dl.php?fn=NewsML/1.2/specification/NewsML_1.2.dtd" [
<!ENTITY % nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.4/specification/dtd/nitf-3-4.dtd">
%nitf;
]>
<NewsML>
...
to this:
<?xml version="1.0" encoding="iso-8859-1"?>
<NewsML
xmlns="http://iptc.org/std/NewsML/2003-10-10/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://iptc.org/std/NewsML/2003-10-10/ http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.xsd http://iptc.org/std/NITF/2006-10-18/ http://contentdienst.pressetext.com/misc/nitf-3-4.xsd"
>
...
Most likely this is because they've introduced a default namespace (xmlns="http://iptc.org/std/NewsML/2003-10-10/") into their document. SimpleXML's support for default namespaces is not very good, to put it mildly.
Can you try to explicitly register a namespace prefix:
$xml->registerXPathNamespace("n", "http://iptc.org/std/NewsML/2003-10-10/");
$xml->xpath('/n:NewsML');
You would have to adapt your XPath expressions to use the "n:" prefix on every element. Here is some additional info: http://people.ischool.berkeley.edu/~felix/xml/php-and-xmlns.html.
EDIT: As per the spec:
The registerXPathNamespace() function creates a prefix/ns context for the next XPath query.
This means it would have to be called before every XPath query, thus a function to wrap XPath queries would be the natural thing to do:
function simplexml_xpath_ns($element, $xpath, $xmlns)
{
foreach ($xmlns as $prefix_uri)
{
list($prefix, $uri) = explode("=", $prefix_uri, 2);
$element->registerXPathNamespace($prefix, $uri);
}
return $element->xpath($xpath);
}
Usage:
$xmlns = ["n=http://iptc.org/std/NewsML/2003-10-10/"];
$result = simplexml_xpath_ns($xml, '/n:NewsML', $xmlns);
Related
I received these xml from external services:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href=(...)?>
<pos:Document xmlns:pos=(...) xmlns:str=(...) xmlns:xsi=(...) xsi:schemaLocation=(...)>
<pos:DescribeDocument>
(...)
</pos:DescribeDocument>
<pos:UPP>
(...)
</pos:UPP>
<ds:Signature Id="ID-9326" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<ds:SignedInfo Id="ID-9325" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:pos="adress" xmlns:str="adress" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
(...)
</ds:SignedInfo>
<ds:Object>
<xades:QualifyingProperties Id="ID-9337a6d1" Target="#ID-932668c0-d4f9-11e3-bb2d-001a645ad128" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#">
<xades:SignedProperties Id="ID-9337a6d0" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:pos="adress" xmlns:str="adress" xmlns:xades="adress" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xades:SignedSignatureProperties>
<xades:SigningTime>sometime</xades:SigningTime>
<xades:SigningCertificate>
<xades:Cert>
<xades:CertDigest>
<ds:DigestMethod Algorithm="adress"/>
<ds:DigestValue>someValue</ds:DigestValue>
</xades:CertDigest>
<xades:IssuerSerial>
<ds:X509IssuerName>CNsomeValue</ds:X509IssuerName>
<ds:X509SerialNumber>SerialsomeValue</ds:X509SerialNumber>
</xades:IssuerSerial>
</xades:Cert>
</xades:SigningCertificate>
<xades:SignaturePolicyIdentifier>
<xades:SignaturePolicyImplied/>
</xades:SignaturePolicyIdentifier>
</xades:SignedSignatureProperties>
<xades:SignedDataObjectProperties>
<xades:DataObjectFormat ObjectReference="#ID-93257e60">
<xades:Description>NEEDVALUE</xades:Description>
</xades:DataObjectFormat>
</xades:SignedDataObjectProperties>
</xades:SignedProperties>
</xades:QualifyingProperties>
</ds:Object>
</ds:Signature>
</pos:Document>
It's have a few namespace. And I have to get value in value.
I wrote a some code but nothing works:
$xmlFileContent = file_get_contents($pathToXML);
$dom = new SimpleXMLElement($xmlFileContent, LIBXML_COMPACT);
$namespaces = $dom->getNamespaces(true);
foreach ($namespaces as $key => $namespace) {
$dom->registerXPathNamespace($key, $namespace);
}
$matches = $dom->xpath('//xades:Description'); //no success
and
$doms = new DOMDocument;
$doms->loadXML($path);
foreach($doms->getElementsByTagNameNS($namespaces['xades'],'*') as $element){
echo 'local name: ', $element->localName, ', prefix: ', $element->prefix, "\n"; //no success
}
Can you help me to get to these node (xades:Description)?
PS:
i used it too (but no success):
$result1 = $dom->xpath('/Dokument/ds:Signature/ds:Object/xades:QualifyingProperties/xades:SignedProperties/xades:SignedDataObjectProperties/xades:DataObjectFormat/xades:Description');
You removed the namespace definitions from your XML. If you see an attribute like 'xmlns:xades' the actual namespace is the value of that attribute. It defines an alias/prefix for that namespace in the current context. Most of the time URLs are used as namespace identifiers (because it avoids potential conflicts). But a value like urn:somestring is valid.
You need to register a prefix for the namespace on the DOMXpath object, too. This prefix does not need to be identical to the one in the document. Look at it this way:
DOMDocument resolves a node name from prefix:Description to {namespace-uri}:Description.
DOMXpath resolves the node names in the Xpath expressions the same way using its own definition. For example //prefix:Description to //{namespace-uri}:Description. Then it compares the resolved names.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('x', 'urn:xades');
var_dump($xpath->evaluate('string(//x:Description)'));
Output: https://eval.in/181304
string(9) "NEEDVALUE"
There is possible workaround by modifying your XPath to ignore namespaces, just in case you can't find proper solution :
$result = $dom->xpath('//*[local-name() = "Description"]');
I'm pretty new to this, and I've followed several tutorials (including other OS questions), but I can't seem to get this to work.
I'm working with a library's EAD file (Library of Congress XML standard for describing library collections, http://www.loc.gov/ead/index.html), and I'm having trouble with the namespaces.
A simplified example of the XML:
<?xml version="1.0"?>
<ead xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd" xmlns:ns2="http://www.w3.org/1999/xlink" xmlns="urn:isbn:1-931666-22-9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<c02 id="ref24" level="item">
<did>
<unittitle>"Lepidoptera and seas(on) of appearance"</unittitle>
<unitid>1</unitid>
<container id="cid71717" type="Box" label="Mixed materials">1</container>
<physdesc>
<extent>Pencil</extent>
</physdesc>
<unitdate>[1817]</unitdate>
</did>
<dao id="ref001" ns2:actuate="onRequest" ns2:show="embed" ns2:role="" ns2:href="http://diglib.amphilsoc.org/fedora/repository/graphics:92"/>
</c02>
<c02 id="ref25" level="item">
<did>
<unittitle>Argus carryntas (Butterfly)</unittitle>
<unitid>2</unitid>
<container id="cid71715" type="Box" label="Mixed materials">1</container>
<physdesc>
<extent>Watercolor</extent>
</physdesc>
<unitdate>[1817]</unitdate>
</did>
<dao ns2:actuate="onRequest" ns2:show="embed" ns2:role="" ns2:href="http://diglib.amphilsoc.org/fedora/repository/graphics:87"/>
</c02>
Following advise I found elsewhere, I was trying this (and variations on this theme):
<?php
$entries = simplexml_load_file('test.xml');
foreach ($entries->c02->children('http://www.w3.org/1999/xlink') as $entry) {
echo 'link: ', $entry->children('dao', true)->href, "\n";
}
?>
Which, of course, isn't working.
You have to understand the difference between a namespace and a namespace prefix. The namespace is the value inside the xmlns attributes. The xmlns attributes define the prefix, which is an alias for the actual namespace for that node and its descendants.
In you example are three namespaces:
http://www.w3.org/1999/xlink with the alias "ns2"
urn:isbn:1-931666-22-9 without an alias
http://www.w3.org/2001/XMLSchema-instance with the alias "xsi"
So elements and attributes starting with "ns2:" are inside the xlink namespace, elements and attributes starting with "xsi:" in the XML schema instance namespace. All elements without an namespace prefix are in the isbn specific namespace. Attributes without a namespace prefix are always in NO namespace.
If you query the xml dom, you need to define your own namespaces prefixes. The namespace prefixes in the xml documents can change, especially if they are external resources.
I don't use "SimpleXML", so here is an DOM example:
<?php
$xml = <<<'XML'
<?xml version="1.0"?>
<ead
xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd"
xmlns:ns2="http://www.w3.org/1999/xlink"
xmlns="urn:isbn:1-931666-22-9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<c02 id="ref24" level="item">
<did>
<unittitle>"Lepidoptera and seas(on) of appearance"</unittitle>
</did>
</c02>
</ead>
XML;
// create dom and load the xml
$dom = new DOMDocument();
$dom->loadXml($xml);
// create an xpath object
$xpath = new DOMXpath($dom);
// register you own namespace prefix
$xpath->registerNamespace('isbn', 'urn:isbn:1-931666-22-9');
foreach ($xpath->evaluate('//isbn:unittitle', NULL, FALSE) as $node) {
var_dump($node->textContent);
}
Output:
string(40) ""Lepidoptera and seas(on) of appearance""
Xpath is quite powerful and the most comfortable way to extract data from XML.
The default namespace in you case is weird. It looks like it is dynamic, so you might need a way to read it. Here is the Xpath for that:
$defaultNamespace = $xpath->evaluate('string(/*/namespace::*[name() = ""])');
It reads the namespace without a prefix from the document element.
I have xml with the following structure:
<?xml version="1.0"?>
<ONIXMessage xmlns="http://test.com/test">
...data...
</ONIXMessage>
I need to change xmlns attribute with my own value. How can I do it? Preferably with DOMDocument class.
I need to change xmlns attribute with my own value. How can I do it? Preferably with DOMDocument class.
This by design is not possible. Every DOMDocument has a single root/document element.
In your example XML that root element is:
{http://test.com/test}ONIXMessage
I write the element name as an expanded-name with the convention to put the namespace URI in front enclosed in angle brackets.
Writing the element name in a form that shows it's entire expanded-name also demonstrates that you do not only want to change the value of an attribute here, but you want to change the namespace URI of a specific element. So you want to change the element name. And probably also any child element name it contains if the child is in the same namespace.
As the xmlns attribute only reflects the namespace URI of the element itself, you can not change it. Once it is set in DOMDocument, you can not change it.
You can replace the whole element, but the namespace of the children is not changed either then. Here an example with an XML similar to yours with only textnode children (which aren't namespaced):
$xml = <<<EOD
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:old">
...data...
</ONIXMessage>
EOD;
$doc = new DOMDocument();
$doc->loadXML($xml);
$newNode = $doc->createElementNS('uri:new', $doc->documentElement->tagName);
$oldNode = $doc->replaceChild($newNode, $doc->documentElement);
foreach(iterator_to_array($oldNode->childNodes, true) as $child) {
$doc->documentElement->appendChild($child);
}
Resulting XML output is:
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:new">
...data...
</ONIXMessage>
Changing the input XML now to something that contains children like
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:old">
<data>
...data...
</data>
</ONIXMessage>
Will then create the following output, take note of the old namespace URI that pops up now again:
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:new">
<default:data xmlns:default="uri:old">
...data...
</default:data>
</ONIXMessage>
As you can see DOMDocument does not provide a functionality to replace namespace URIs for existing elements out of the box. But hopefully with the information provided in this answer so far it is more clear why exactly it is not possible to change that attributes value if it already exists.
The expat based parser in the libxml based PHP extension does allow to "change" existing attribute values regardless if it is an xmlns* attribute or not - because it just parses the data and you can process it on the fly with it.
A working example is:
$xml = <<<EOD
<?xml version="1.0" encoding="utf-8"?>
<ONIXMessage xmlns="uri:old">
<data>
...data...
</data>
</ONIXMessage>
EOD;
$uriReplace = [
'uri:old' => 'uri:new',
];
$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_set_default_handler($parser, function ($parser, $data) {
echo $data;
});
xml_set_element_handler($parser, function ($parser, $name, $attribs) use ($xml, $uriReplace) {
$selfClosing = '/>' === substr($xml, xml_get_current_byte_index($parser), 2);
echo '<', $name;
foreach ($attribs as $name => $value) {
if (substr($name, 0, 5) === 'xmlns' && isset($uriReplace[$value])) {
$value = $uriReplace[$value];
}
printf(' %s="%s"', $name, htmlspecialchars($value, ENT_COMPAT | ENT_XML1));
}
echo $selfClosing ? '/>' : '>';
}, function ($parser, $name) use ($xml) {
$selfClosing = '/>' === substr($xml, xml_get_current_byte_index($parser) - 2, 2);
if ($selfClosing) return;
echo '</', $name, '>';
});
xml_parse($parser, $xml, true);
xml_parser_free($parser);
The output then has transparently changed the namespace URI from uri:old to uri:new:
<ONIXMessage xmlns="uri:new">
<data>
...data...
</data>
</ONIXMessage>
As this example shows, each XML feature you make use of in your XML needs to be handled with the parser. For example the XML declaration is missing. However these can be added by implementing missing handler classbacks (e.g. for CDATA sections) or by outputting missing output (e.g. for the "missing" XML declaration). I hope this is helpful and shows you an alternative way on how to change even these values that are not intended to change.
A bit of background:
Hi guys,
I'm creating a router in PHP for an MVC application on and decided that the structure would be in XML. I have an XML file containing all valid routes (pages) in the system along with their associated controller & action.
There's also a 'param' to indicate if there's a variable on the end of the URI and the variable name to assign it to (confusingly named I know!!)
What I'm doing is looking at the REQUEST_URI and using PHP's explode function to turn it into an array of 'route' elements which I then build a query for.
Here's some sample XML:
<routes>
<route>
<url>blog</url>
<params>
<controller>blogController</controller>
<action>indexAction</action>
</params>
<route>
<url>entry</url>
<params>
<controller>blogController</controller>
<action>entryAction</action>
<param>entryId</param>
</params>
</route>
</route>
</routes>
And here's the query that's being built:
/routes/route/url[text()="blog"]/../route/url[text()="entry"]/..
This always seems to return 0 nodes in PHPs XPath, but using an online expression tester I get the entry route matched.
Can anyone explain what might be going wrong? Does PHP's Xpath parser understand this syntax? I have also tried the ::parent* method
Cheers!
You shouldn't need .. or parent::*.
Try this instead:
/routes/route[url="blog"]/route[url="entry"]
You shouldn't need text() either, but I also don't know PHP very well. Ff the above doesn't work, try:
/routes/route[url/text()="blog"]/route[url/text()="entry"]
XPath wise I come to the same conclusion like DevNull, only a slight addition to select the first match:
/routes/route[url="blog"]/route[url="entry"][1]
With a object interface:
$routes = new RoutesXML($xml);
var_dump($routes->fromPath('blog/entry')); # The SimpleXMLElement
var_dump($routes->fromPath('blog/entry2')); # NULL
Example Implementation:
class RoutesXML
{
private $xml;
public function __construct($xml) {
$this->xml = simplexml_load_string($xml);
}
public function fromPath($path) {
$expression = '/routes';
foreach(explode('/', $path) as $element)
$expression .= "/route[url='$element']";
$expression .= '[1]';
list($route) = $this->xml->xpath($expression) + array(NULL);
return $route;
}
}
This sounds like a pretty easy question to answer but I haven't been able to get it to work. I'm running PHP 5.2.6.
I have a DOM element (the root element) which, when I go to $element->saveXML(), it outputs an xmlns attribute:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
...
However, I cannot find any way programmatically within PHP to see that namespace. I want to be able to check whether it exists and what it's set to.
Checking $document->documentElement->namespaceURI would be the obvious answer but that is empty (I've never actually been able to get that to be non-empty). What is generating that xmlns value in the output and how can I read it?
The only practical way I've been able to do this so far is a complete hack - by saving it as XML to a string using saveXML() then reading through that using regular expressions.
Edit:
This may be a peculiarity of loading XML in using loadHTML() rather than loadXML() and then printing it out using saveXML(). When you do that, it appears that for some reason saveXML adds an xmlns attribute even though there is no way to detect that this xmlns value is part of the document using DOM methods. Which I guess means that if I had a way of detecting whether the document passed in had been loaded in using loadHTML() then I could solve this a different way.
Like edorian already showed, getting the namespace works fine when the Markup is loaded with loadXML. But you are right that this wont work for Markup loaded with loadHTML:
$html = <<< XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:m="foo" lang="en">
<body xmlns="foo">Bar</body>
</html>
XML;
$dom = new DOMDocument;
$dom->loadHTML($html);
var_dump($dom->documentElement->getAttribute("xmlns"));
var_dump($dom->documentElement->lookupNamespaceURI(NULL));
var_dump($dom->documentElement->namespaceURI);
will produce empty results. But you can use XPath
$xp = new DOMXPath($dom);
echo $xp->evaluate('string(#xmlns)');
// http://www.w3.org/1999/xhtml;
and for body
echo $xp->evaluate('string(body/#xmlns)'); // foo
or with context node
$body = $dom->documentElement->childNodes->item(0);
echo $xp->evaluate('string(#xmlns)', $body);
// foo
My uneducated assumption is that internally, a HTML Document is different from a real Document. Internally libxml uses a different module to parse HTML and the DOMDocument itself will be of a different nodeType, as you can simply verify by doing
var_dump($dom->nodeType); // 13 with loadHTML, 9 with loadXml
with 13 being a XML_HTML_DOCUMENT_NODE.
With PHP 5.2.6 i've found 2 ways to this:
<?php
$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?'.
'><html xmlns="http://www.w3.org/1999/xhtml" lang="en"></html>';
$x = DomDocument::loadXml($xml);
var_dump($x->documentElement->getAttribute("xmlns"));
var_dump($x->documentElement->lookupNamespaceURI(NULL));
prints
string(28) "http://www.w3.org/1999/xhtml"
string(28) "http://www.w3.org/1999/xhtml"
Hope thats what you asked for :)
Well, you can do so with a function like this:
function getNamespaces(DomNode $node, $recurse = false) {
$namespaces = array();
if ($node->namespaceURI) {
$namespaces[] = $node->namespaceURI;
}
if ($node instanceof DomElement && $node->hasAttribute('xmlns')) {
$namespaces[] = $xmlns = $node->getAttribute('xmlns');
foreach ($node->attributes as $attr) {
if ($attr->namespaceURI == $xmlns) {
$namespaces[] = $attr->value;
}
}
}
if ($recurse && $node instanceof DomElement) {
foreach ($node->childNodes as $child) {
$namespaces = array_merge($namespaces, getNamespaces($child, vtrue));
}
}
return array_unique($namespaces);
}
So, you feed it a DomEelement, and then it finds all related namespaces:
$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en"
xmlns:foo="http://example.com/bar">
<body>
<h1>foo</h1>
<foo:h2>bar</foo:h2>
</body>
</html>';
var_dump(getNamespaces($dom->documentElement, true));
Prints out:
array(2) {
[0]=>
string(28) "http://www.w3.org/1999/xhtml"
[3]=>
string(22) "http://example.com/bar"
}
Note that DomDocument will automatically strip out all unused namespaces...
As for why $dom->documentElement->namespaceURI is always null, it's because the document element doesn't have a namespace. The xmlns attribute provides a default namespace for the document, but it doesn't endow the html tag with a namespace (for purposes of DOM interaction). You can try doing a $dom->documentElement->removeAttribute('xmlns'), but I'm not 100% sure if it will work...