I am using the DOM extension to parse an xml file containing xml namespaces. I would have thought that namespace declarations are treated just like any other attribute, but my tests seem to disagree. I have a document that starts like this:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:prism="http://purl.org/rss/1.0/modules/prism/"
xmlns:admin="http://webns.net/mvcb/"
>
And a test code like this:
$doc = new DOMDocument();
$doc->loadXml(file_get_contents('/home/soulmerge/tmp/rss1.0/recent.xml'));
$root = $doc->documentElement;
var_dump($root->tagName);
# prints 'string(7) "rdf:RDF"'
var_dump($root->attributes->item(0));
# prints 'NULL'
var_dump($root->getAttributeNode('xmlns'));
# prints 'object(DOMNameSpaceNode)#3 (0) {}'
So the questions are:
Does anyone know where could I find the documentation of DOMNameSpaceNode? A search on php.net does not yield any useful result.
How do I extract all those namespace declarations from that DOMElement?
Unless there is a more direct way you can use XPath and its namespace axis.
e.g.
<?php
$doc = new DOMDocument;
$doc->loadxml('<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:prism="http://purl.org/rss/1.0/modules/prism/"
xmlns:admin="http://webns.net/mvcb/"
>
...
</rdf:RDF>');
$context = $doc->documentElement;
$xpath = new DOMXPath($doc);
foreach( $xpath->query('namespace::*', $context) as $node ) {
echo $node->nodeValue, "\n";
}
prints
http://www.w3.org/XML/1998/namespace
http://webns.net/mvcb/
http://purl.org/rss/1.0/modules/prism/
http://purl.org/rss/1.0/modules/syndication/
http://purl.org/dc/elements/1.1/
http://purl.org/rss/1.0/modules/taxonomy/
http://purl.org/rss/1.0/
http://www.w3.org/1999/02/22-rdf-syntax-ns#
edit and btw: I haven't found documentation for DOMNameSpaceNode either. But you can "deduct" (parts of) its functionality from the source code in ext/dom/php_dom.c
It doesn't seem to expose any methods and exposes the properties
"nodeName", "nodeValue", "nodeType",
"prefix", "localName", "namespaceURI",
"ownerDocument", "parentNode"
all handled by the same functions as the corresponding DOMNode properties.
Note, that
echo $root->getAttributeNode('xmlns')->nodeValue . "\n";
echo $root->getAttribute('xmlns') . "\n";
echo $root->getAttribute('xmlns:syn') . "\n";
all work as expected, and print out
http://purl.org/rss/1.0/
http://purl.org/rss/1.0/
http://purl.org/rss/1.0/modules/syndication/
because DOMNameSpaceNode is a Node, not a NodeCollection.
Just clarifying, that, unless something in PHP DOM extension changes, XPath (as explained by VolkerK) is the only native way to get all the namespaces, regardless of documentation.
Related
I want to read this xml document:
<?xml version="1.0" encoding="UTF-8"?>
<tns:getPDMNumber xmlns:tns="http://www.testgroup.com/TestPDM" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.testgroup.com/TestPDM getPDMNumber.xsd ">
<tns:getPDMNumberResponse>
<tns:requestID>22222</tns:requestID>
<tns:pdmNumber>654321</tns:pdmNumber>
<tns:responseCode>0</tns:responseCode>
</tns:getPDMNumberResponse>
</tns:getPDMNumber>
I tried it this way:
$dom->load('response/17_getPDMNumberResponse.xml');
$nodes = $dom->getElementsByTagName("tns:requestID");
//$nodes = $dom->getElementsByTagName("tns:getPDMNumber");
//$nodes = $dom->getElementsByTagName("tns:getPDMNumberResponse");
foreach($nodes as $node)
{
$response=$node->getElementsByTagName("tns:getPDMNumber");
foreach($response as $info)
{
$test = $info->getElementsByTagName("tns:pdmNumber");
$pdm = $test->nodeValue;
}
}
the code never runs into the foreach loop.
Only for clarification my goal is to read the "tns:pdmNumber" node.
Have anybody a idea?
EDIT: I have also tried the commited lines.
The XML uses a namespace, so you should use the namespace aware methods. They have the suffix _NS.
$tns = 'http://www.testgroup.com/TestPDM';
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagNameNS($tns, "pdmNumber") as $node) {
var_dump($node->textContent);
}
Output:
string(6) "654321"
A better option is to use Xpath expression. They allow a more comfortable access to DOM nodes. In this case you have to register a prefix for the namespace that you can use in the Xpath expression:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('t', 'http://www.testgroup.com/TestPDM');
var_dump(
$xpath->evaluate('string(/t:getPDMNumber/t:getPDMNumberResponse/t:pdmNumber)')
);
This:
$nodes = $dom->getElementsByTagName("tns:requestID");
you find all the requestID nodes, and try to loop on them. That's fine, but then you use that node as a basis to find any getPDMNumber nodes UNDER the requestID - but there's nothing - requestID is a terminal node. So
$response=$node->getElementsByTagName("tns:getPDMNumber");
finds nothing, and the inner loop has nothing to do.
It's like saying "Start digging a hole until you reach china. Once you reach China, keep digging until you reach Australia". But you can't keep digging - you've reached the "bottom", and the only thing deeper than China would be going into orbit.
Im attempting to echo/assign a variable to the contents of the node "code" which is inside status;
I can get request-id just fine...
Any ideas people?
<?
$responseXML = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<payment xmlns="http://www.example.com" self="http://www.example.com">
<merchant-account-id ref="http://www.example.com">0000</merchant-account-id>
<transaction-id>0000</transaction-id>
<request-id>0000</request-id>
<transaction-type>auth</transaction-type>
<transaction-state>success</transaction-state>
<completion-time-stamp>2015-12-28T17:39:25.000Z</completion-time-stamp>
<statuses>
<status code="201.0000" description="3d-acquirer:The resource was successfully created." severity="information"/>
</statuses>
<avs-code>P</avs-code>
<requested-amount currency="GBP">0.01</requested-amount>
<account-holder>
<first-name>test</first-name>
<last-name>test</last-name>
<email>test.test#hotmail.co.uk</email>
<phone>00000000000</phone>
<address>
<street1>test</street1>
<city>test test</city>
<state>test</state>
<country>GB</country>
</address>
</account-holder>
<card-token>
<token-id>000</token-id>
<masked-account-number>000000******0000</masked-account-number>
</card-token>
<ip-address>192.168.0.1</ip-address>
<descriptor></descriptor>
<authorization-code>000000</authorization-code>
<api-id>000-000</api-id>
</payment>';
$doc = new DOMDocument;
$doc->loadXML($responseXML);
echo $doc->getElementsByTagName('request-id')->item(0)->nodeValue;
echo $doc->getElementsByTagName('status code')->item(0)->nodeValue;
?>
I've tried simplexml looad string, but pulling hair out with this one, can anybody shed some light, speed of getting this info out in one process is quite important so not to stress the webserver out!
Many thanks.
Using DOM is a good idea, but the API methods are a little cumbersome. Using Xpath makes it a lot easier.
Xpath allows you to use expressions to fetch node lists or scalar values from a DOM:
$document = new DOMDocument;
$document->loadXML($responseXML);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('example', 'http://www.example.com');
echo $xpath->evaluate('string(//example:request-id)'), "\n";
echo $xpath->evaluate('string(//example:status/#code)');
Output:
0000
201.0000
Xpath does not have a default namespace so if you XML has a namespace (like your example) you need to register a prefix for it and use it.
As code is an attribute of xml tag status, doing
getElementsByTagName('status code')
is wrong.
There's a special method for getting attribute value getAttribute:
echo $doc->getElementsByTagName('status')->item(0)->getAttribute('code');
Using XPath allows to access the status node very precisely.
DOMDocument + XPath:
$responseXML = '...';
$doc = new DOMDocument();
$doc->loadXML($responseXML);
$xp = new DOMXpath($doc);
$xp->registerNamespace('example', 'http://www.example.com');
// Every status node.
$statusNodes = $xp->query('//example:status');
// or a very specific one.
$statusNodes = $xp->query('/example:payment/example:statuses/example:status');
$statusNode = $statusNodes[0];
$code = $statusNode->getAttribute('code');
// $code is '201.0000'.
// To change the 'code' value.
$statusNode->setAttribute('code', '302.0000');
I received these xml from external services:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href=(...)?>
<pos:Document xmlns:pos=(...) xmlns:str=(...) xmlns:xsi=(...) xsi:schemaLocation=(...)>
<pos:DescribeDocument>
(...)
</pos:DescribeDocument>
<pos:UPP>
(...)
</pos:UPP>
<ds:Signature Id="ID-9326" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<ds:SignedInfo Id="ID-9325" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:pos="adress" xmlns:str="adress" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
(...)
</ds:SignedInfo>
<ds:Object>
<xades:QualifyingProperties Id="ID-9337a6d1" Target="#ID-932668c0-d4f9-11e3-bb2d-001a645ad128" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#">
<xades:SignedProperties Id="ID-9337a6d0" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:pos="adress" xmlns:str="adress" xmlns:xades="adress" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xades:SignedSignatureProperties>
<xades:SigningTime>sometime</xades:SigningTime>
<xades:SigningCertificate>
<xades:Cert>
<xades:CertDigest>
<ds:DigestMethod Algorithm="adress"/>
<ds:DigestValue>someValue</ds:DigestValue>
</xades:CertDigest>
<xades:IssuerSerial>
<ds:X509IssuerName>CNsomeValue</ds:X509IssuerName>
<ds:X509SerialNumber>SerialsomeValue</ds:X509SerialNumber>
</xades:IssuerSerial>
</xades:Cert>
</xades:SigningCertificate>
<xades:SignaturePolicyIdentifier>
<xades:SignaturePolicyImplied/>
</xades:SignaturePolicyIdentifier>
</xades:SignedSignatureProperties>
<xades:SignedDataObjectProperties>
<xades:DataObjectFormat ObjectReference="#ID-93257e60">
<xades:Description>NEEDVALUE</xades:Description>
</xades:DataObjectFormat>
</xades:SignedDataObjectProperties>
</xades:SignedProperties>
</xades:QualifyingProperties>
</ds:Object>
</ds:Signature>
</pos:Document>
It's have a few namespace. And I have to get value in value.
I wrote a some code but nothing works:
$xmlFileContent = file_get_contents($pathToXML);
$dom = new SimpleXMLElement($xmlFileContent, LIBXML_COMPACT);
$namespaces = $dom->getNamespaces(true);
foreach ($namespaces as $key => $namespace) {
$dom->registerXPathNamespace($key, $namespace);
}
$matches = $dom->xpath('//xades:Description'); //no success
and
$doms = new DOMDocument;
$doms->loadXML($path);
foreach($doms->getElementsByTagNameNS($namespaces['xades'],'*') as $element){
echo 'local name: ', $element->localName, ', prefix: ', $element->prefix, "\n"; //no success
}
Can you help me to get to these node (xades:Description)?
PS:
i used it too (but no success):
$result1 = $dom->xpath('/Dokument/ds:Signature/ds:Object/xades:QualifyingProperties/xades:SignedProperties/xades:SignedDataObjectProperties/xades:DataObjectFormat/xades:Description');
You removed the namespace definitions from your XML. If you see an attribute like 'xmlns:xades' the actual namespace is the value of that attribute. It defines an alias/prefix for that namespace in the current context. Most of the time URLs are used as namespace identifiers (because it avoids potential conflicts). But a value like urn:somestring is valid.
You need to register a prefix for the namespace on the DOMXpath object, too. This prefix does not need to be identical to the one in the document. Look at it this way:
DOMDocument resolves a node name from prefix:Description to {namespace-uri}:Description.
DOMXpath resolves the node names in the Xpath expressions the same way using its own definition. For example //prefix:Description to //{namespace-uri}:Description. Then it compares the resolved names.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('x', 'urn:xades');
var_dump($xpath->evaluate('string(//x:Description)'));
Output: https://eval.in/181304
string(9) "NEEDVALUE"
There is possible workaround by modifying your XPath to ignore namespaces, just in case you can't find proper solution :
$result = $dom->xpath('//*[local-name() = "Description"]');
I'm pretty new to this, and I've followed several tutorials (including other OS questions), but I can't seem to get this to work.
I'm working with a library's EAD file (Library of Congress XML standard for describing library collections, http://www.loc.gov/ead/index.html), and I'm having trouble with the namespaces.
A simplified example of the XML:
<?xml version="1.0"?>
<ead xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd" xmlns:ns2="http://www.w3.org/1999/xlink" xmlns="urn:isbn:1-931666-22-9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<c02 id="ref24" level="item">
<did>
<unittitle>"Lepidoptera and seas(on) of appearance"</unittitle>
<unitid>1</unitid>
<container id="cid71717" type="Box" label="Mixed materials">1</container>
<physdesc>
<extent>Pencil</extent>
</physdesc>
<unitdate>[1817]</unitdate>
</did>
<dao id="ref001" ns2:actuate="onRequest" ns2:show="embed" ns2:role="" ns2:href="http://diglib.amphilsoc.org/fedora/repository/graphics:92"/>
</c02>
<c02 id="ref25" level="item">
<did>
<unittitle>Argus carryntas (Butterfly)</unittitle>
<unitid>2</unitid>
<container id="cid71715" type="Box" label="Mixed materials">1</container>
<physdesc>
<extent>Watercolor</extent>
</physdesc>
<unitdate>[1817]</unitdate>
</did>
<dao ns2:actuate="onRequest" ns2:show="embed" ns2:role="" ns2:href="http://diglib.amphilsoc.org/fedora/repository/graphics:87"/>
</c02>
Following advise I found elsewhere, I was trying this (and variations on this theme):
<?php
$entries = simplexml_load_file('test.xml');
foreach ($entries->c02->children('http://www.w3.org/1999/xlink') as $entry) {
echo 'link: ', $entry->children('dao', true)->href, "\n";
}
?>
Which, of course, isn't working.
You have to understand the difference between a namespace and a namespace prefix. The namespace is the value inside the xmlns attributes. The xmlns attributes define the prefix, which is an alias for the actual namespace for that node and its descendants.
In you example are three namespaces:
http://www.w3.org/1999/xlink with the alias "ns2"
urn:isbn:1-931666-22-9 without an alias
http://www.w3.org/2001/XMLSchema-instance with the alias "xsi"
So elements and attributes starting with "ns2:" are inside the xlink namespace, elements and attributes starting with "xsi:" in the XML schema instance namespace. All elements without an namespace prefix are in the isbn specific namespace. Attributes without a namespace prefix are always in NO namespace.
If you query the xml dom, you need to define your own namespaces prefixes. The namespace prefixes in the xml documents can change, especially if they are external resources.
I don't use "SimpleXML", so here is an DOM example:
<?php
$xml = <<<'XML'
<?xml version="1.0"?>
<ead
xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd"
xmlns:ns2="http://www.w3.org/1999/xlink"
xmlns="urn:isbn:1-931666-22-9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<c02 id="ref24" level="item">
<did>
<unittitle>"Lepidoptera and seas(on) of appearance"</unittitle>
</did>
</c02>
</ead>
XML;
// create dom and load the xml
$dom = new DOMDocument();
$dom->loadXml($xml);
// create an xpath object
$xpath = new DOMXpath($dom);
// register you own namespace prefix
$xpath->registerNamespace('isbn', 'urn:isbn:1-931666-22-9');
foreach ($xpath->evaluate('//isbn:unittitle', NULL, FALSE) as $node) {
var_dump($node->textContent);
}
Output:
string(40) ""Lepidoptera and seas(on) of appearance""
Xpath is quite powerful and the most comfortable way to extract data from XML.
The default namespace in you case is weird. It looks like it is dynamic, so you might need a way to read it. Here is the Xpath for that:
$defaultNamespace = $xpath->evaluate('string(/*/namespace::*[name() = ""])');
It reads the namespace without a prefix from the document element.
I am using php's simple xml and xpath to parse an rdf xml file and am struggling to get a list of all the rdf:about values.
Any advice?
There seems to be an issue when using SimpleXml with namespaced attributes prior to PHP5.3. Basically, anything with a : will be dropped when converted to an object property of a SimpleXml element. The following will do, but feels hackish to me:
$rdf = str_replace('rdf:about', 'rdf_about', $rdf);
$rdf = new SimpleXMLElement($rdf);
foreach($rdf->xpath('//#rdf_about') as $node) {
echo $node, PHP_EOL;
}
See here:
http://groups.google.com/group/comp.lang.php/browse_thread/thread/d2a9b29ee21f7403/c6b24b6d398ece2c
You could use DOM instead of SimpleXml:
$dom = new DomDocument;
$dom->loadXml($rdf);
$xph = new DOMXPath($dom);
$xph->registerNamespace('rdf', "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
foreach($xph->query('//#rdf:about') as $attribute) {
echo $attribute->value, PHP_EOL;
}
But, I suggest using a dedicated library for this over SimpleXml or DOM:
http://arc.semsol.org/docs/v2/parsing
http://www.seasr.org/wp-content/plugins/meandre/rdfapi-php/doc/
http://librdf.org/raptor/
http://phpxmlclasses.sourceforge.net/show_doc.php?class=class_rdf_parser.html
And here's a blog post about the parsers:
http://www.wasab.dk/morten/blog/archives/2004/05/31/easy-rdf-parsing-with-php