I am parsing an XML file from php through simplexml_load_file function. They have multiple namespace in it.
<?xml version="1.0" encoding="utf-8"?>
<RETURN:RT xmlns:FORMM="http://example.com/5"
xmlns:Form="http://example.com/master"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:RETURN="http://example.com">
<FORMM:5F>
<Form:Info>
<Form:SWCreatedBy>SW10002087</Form:SWCreatedBy>
</Form:Info>
</FORMM:5F>
</RETURN:RT>
Here is my PHP Code :
$xml = simplexml_load_file('filename');
$namespaces = $xml->getNamespaces(true);
foreach ($namespaces as $key => $extension) {
$xml->registerXPathNamespace($key, $extension);
}
$main = $xml->xpath('//RETURN:RT/*');
foreach ($main as $itr) {
echo '<pre>';
print_r($itr);
echo '</pre>';
}
Output:
SimpleXMLElement Object
(
)
But when i remove the Form from namespace then it gives me the ouput
SimpleXMLElement Object
(
[Form:Info] => SimpleXMLElement Object
(
[Form:SWCreatedBy] => SW10002087
)
)
Can any give me solution regarding this i don't have to remove the form every time from xml file.
5F is an invalid tag name - they are not allowed to start with digits. The parser will throw warnings if you load this (invalid) XML.
Warning: simplexml_load_string(): namespace error : Failed to parse QName 'FORMM:' in /in/HHTnS on line 33
Warning: simplexml_load_string(): <FORMM:5F> in /in/HHTnS on line 33
Warning: simplexml_load_string(): ^ in /in/HHTnS on line 33
So you will need some error handling for this and/or have the XML fixed.
However LibXML (the library behind SimpleXML and DOM) tolerates it. So you can still read it.
Don't fetch namespaces from the document. This is method for debugging or generic transformations. The namespace URI is the identifying value of a namespace. Namespaces prefixes can change at any element in an XML. Define/register you own prefixes for the Xpath expressions and use the Namespace URI in the method calls.
print_r(), var_dump(), ... are using SimpleXMLs mapping of XML elements. This is a compromise and has limits (especially with namespaces). Try SimpleXMLElement::asXML() to debug a SimpleXMLElement instance.
You can still access the elements using property syntax, but you might need to use an SimpleXMLElement::children() to specify the namespace.
Here is an example based on your question:
$xml = <<<'XML'
<?xml version="1.0" encoding="utf-8"?>
<RETURN:RT
xmlns:FORMM="http://example.com/5"
xmlns:Form="http://example.com/master"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:RETURN="http://example.com">
<FORMM:5F>
<Form:Info>
<Form:SWCreatedBy>SW10002087</Form:SWCreatedBy>
</Form:Info>
</FORMM:5F>
</RETURN:RT>
XML;
// define namespaces you're going to use
$xmlns = [
'five' => 'http://example.com/5',
'master' => 'http://example.com/master',
'return' => 'http://example.com',
];
// a function to apply all of them to a SimpleXMLElement
function registerXpathNamespaces(SimpleXMLElement $target, array $xmlns) {
foreach ($xmlns as $prefix => $namespaceURI) {
$target->registerXpathNamespace($prefix, $namespaceURI);
}
}
$rt = simplexml_load_string($xml);
registerXpathNamespaces($rt, $xmlns);
$list = $rt->xpath('//return:RT/*');
foreach ($list as $fiveF) {
// serialize the node into a string for debugging
var_dump($fiveF->asXML());
// use the namespace list to access child elments in a specific namespace
var_dump(
(string)$fiveF->children($xmlns['master'])->Info->SWCreatedBy
);
}
Output:
string(130) "<FORMM:5F>
<Form:Info>
<Form:SWCreatedBy>SW10002087</Form:SWCreatedBy>
</Form:Info>
</FORMM:5F>"
string(10) "SW10002087"
You need to register the namespaces on each SimpleXMLElement instance you are calling xpath() on.
In DOM, here is a separate Xpath object, so you only need to register the namespaces once. Additionally DOMXpath::evaluate() can return scalar values (not just node lists). This it how the example would look with DOM:
// define namespaces you're going to use
$xmlns = [
'five' => 'http://example.com/5',
'master' => 'http://example.com/master',
'return' => 'http://example.com',
];
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
foreach ($xmlns as $prefix => $namespaceURI) {
$xpath->registerNamespace($prefix, $namespaceURI);
}
$list = $xpath->evaluate('//return:RT/*');
foreach ($list as $fiveF) {
// serialize the node into a string for debugging
var_dump($document->saveXML($fiveF));
// directly fetch the value using a string cast in Xpath
var_dump(
$xpath->evaluate(
'string(master:Info/master:SWCreatedBy)',
$fiveF
)
);
}
Using DOM you can take a look how the parser reads the broken tag.
// the RETURN:RT is valid - the parser can match it to the namespace
var_dump($document->documentElement->localName, $document->documentElement->namespaceURI);
foreach ($list as $fiveF) {
// the "5F" or "FORMM:5F" is invalid - the parser will not match it
var_dump($fiveF->localName, $fiveF->namespaceURI);
}
string(2) "RT"
string(18) "http://example.com"
string(8) "FORMM:5F"
NULL
So it keeps the alias/prefix as a part of the name and ignores the namespace.
Related
I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:ooo="http://openoffice.org/2004/office" office:version="1.1">
<office:meta>
<dc:creator>Mark Baker</dc:creator>
<dc:date>2010-09-01T22:49:33Z</dc:date>
<meta:creation-date>2010-09-01T22:48:39Z</meta:creation-date>
<meta:editing-cycles>4</meta:editing-cycles>
<meta:editing-duration>PT00H04M20S</meta:editing-duration>
<meta:generator>OpenOffice.org/3.1$Win32 OpenOffice.org_project/310m11$Build-9399</meta:generator>
</office:meta>
</office:document-meta>
</gnm:Workbook>
And am trying to read the office:document-meta node to extractthe various elements below it (dc:creator, meta:creation-date, etc.)
The following code:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
var_dump($officeXML);
echo '<hr />';
gives me:
object(SimpleXMLElement)[91]
public 'document-meta' =>
object(SimpleXMLElement)[93]
public '#attributes' =>
array
'version' => string '1.1' (length=3)
public 'meta' =>
object(SimpleXMLElement)[94]
but if I try to read the document-meta element using:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
$docMeta = $officeXML->document-meta;
var_dump($docMeta);
echo '<hr />';
I get
Notice: Use of undefined constant meta - assumed 'meta' in /usr/local/apache/htdocsNewDev/PHPExcel/Classes/PHPExcel/Reader/Gnumeric.php on line 273
int 0
I assume that SimpleXML is trying to extract a non-existent node "document" from $officeXML, then subtract the value of (non-existent) constant "meta", resulting in forcing the integer 0 result rather than the document-meta node.
Is there a way to resolve this using SimpleXML, or will I be forced to rewrite using XMLReader? Any help appreciated.
Your assumption is correct. Use
$officeXML->{'document-meta'}
to make it work.
Please note that the above applies to Element nodes. Attribute nodes (those within the #attributes property when dumping the SimpleXmlElement) do not require any special syntax to be accessed when hyphenated. They are regularly accessible via array notation, e.g.
$xml = <<< XML
<root>
<hyphenated-element hyphenated-attribute="bar">foo</hyphenated-element>
</root>
XML;
$root = new SimpleXMLElement($xml);
echo $root->{'hyphenated-element'}; // prints "foo"
echo $root->{'hyphenated-element'}['hyphenated-attribute']; // prints "bar"
See the SimpleXml Basics in the Manual for further examples.
I assume the best way to do it is to cast to array:
Consider the following XML:
<subscribe hello-world="yolo">
<callback-url>example url</callback-url>
</subscribe>
You can access members, including attributes, using a cast:
<?php
$xml = (array) simplexml_load_string($input);
$callback = $xml["callback-url"];
$attribute = $xml['#attributes']['hello-world'];
It makes everything easier. Hope I helped.
I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:ooo="http://openoffice.org/2004/office" office:version="1.1">
<office:meta>
<dc:creator>Mark Baker</dc:creator>
<dc:date>2010-09-01T22:49:33Z</dc:date>
<meta:creation-date>2010-09-01T22:48:39Z</meta:creation-date>
<meta:editing-cycles>4</meta:editing-cycles>
<meta:editing-duration>PT00H04M20S</meta:editing-duration>
<meta:generator>OpenOffice.org/3.1$Win32 OpenOffice.org_project/310m11$Build-9399</meta:generator>
</office:meta>
</office:document-meta>
</gnm:Workbook>
And am trying to read the office:document-meta node to extractthe various elements below it (dc:creator, meta:creation-date, etc.)
The following code:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
var_dump($officeXML);
echo '<hr />';
gives me:
object(SimpleXMLElement)[91]
public 'document-meta' =>
object(SimpleXMLElement)[93]
public '#attributes' =>
array
'version' => string '1.1' (length=3)
public 'meta' =>
object(SimpleXMLElement)[94]
but if I try to read the document-meta element using:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
$docMeta = $officeXML->document-meta;
var_dump($docMeta);
echo '<hr />';
I get
Notice: Use of undefined constant meta - assumed 'meta' in /usr/local/apache/htdocsNewDev/PHPExcel/Classes/PHPExcel/Reader/Gnumeric.php on line 273
int 0
I assume that SimpleXML is trying to extract a non-existent node "document" from $officeXML, then subtract the value of (non-existent) constant "meta", resulting in forcing the integer 0 result rather than the document-meta node.
Is there a way to resolve this using SimpleXML, or will I be forced to rewrite using XMLReader? Any help appreciated.
Your assumption is correct. Use
$officeXML->{'document-meta'}
to make it work.
Please note that the above applies to Element nodes. Attribute nodes (those within the #attributes property when dumping the SimpleXmlElement) do not require any special syntax to be accessed when hyphenated. They are regularly accessible via array notation, e.g.
$xml = <<< XML
<root>
<hyphenated-element hyphenated-attribute="bar">foo</hyphenated-element>
</root>
XML;
$root = new SimpleXMLElement($xml);
echo $root->{'hyphenated-element'}; // prints "foo"
echo $root->{'hyphenated-element'}['hyphenated-attribute']; // prints "bar"
See the SimpleXml Basics in the Manual for further examples.
I assume the best way to do it is to cast to array:
Consider the following XML:
<subscribe hello-world="yolo">
<callback-url>example url</callback-url>
</subscribe>
You can access members, including attributes, using a cast:
<?php
$xml = (array) simplexml_load_string($input);
$callback = $xml["callback-url"];
$attribute = $xml['#attributes']['hello-world'];
It makes everything easier. Hope I helped.
I am trying to access a list of nodes without namespace declaration within nodes with namespace declaration. My XML file has a main node with namespace ehd with two subnodes header, body within the same namespace. However, all subnodes within the body node have no further namespace declaration. I am struggling with accessing these nodes with SimpleXML.
Excerpt from the xml file:
<?xml version="1.0" encoding="ISO-8859-15"?>
<ehd:ehd ehd_version="1.40" xmlns:ehd="urn:ehd/001" xmlns="urn:ehd/go/001">
<ehd:header>
</ehd:header>
<ehd:body>
<gnr_liste>
<gnr V="01100"></gnr>
<gnr V="01101"></gnr>
<gnr V="01102"></gnr>
</gnr_liste>
</ehd:body>
</ehd:ehd>
My code is as follows:
$xml = simplexml_load_file($file) or die("Failed to load");
$ehd = $xml->children('ehd', true)->body;
simplexml_dump($ehd);
$gnr_liste = $ehd->children('gnr_liste')->children('gnr');
simplexml_dump($gnr_liste);
The output is:
SimpleXML object (1 item)
[
Element {
Namespace: 'urn:ehd/001'
Namespace Alias: 'ehd'
Name: 'ehd'
String Content: ''
Content in Namespace ehd
Namespace URI: 'urn:ehd/001'
Children: 2 - 1 'body', 1 'header'
Attributes: 0
Content in Default Namespace
Children: 0
Attributes: 1 - 'ehd_version'
}
]
SimpleXML object (1 item)
[
Element {
Namespace: 'urn:ehd/001'
Namespace Alias: 'ehd'
Name: 'body'
String Content: ''
Content in Default Namespace
Namespace URI: 'urn:ehd/go/001'
Children: 1 - 1 'gnr_liste'
Attributes: 0
}
]
How do I access all gnr items from the gnr_liste node?
Note: I am using simplexml_dump for debugging
The argument to ->children() is always a namespace identifier or local prefix, never the tag name. If these elements were in "no namespace", you would access them with ->children('').
However, the elements with no prefix in this document do not have no namespace - they are in the default namespace, in this case urn:ehd/go/001 (as defined by xmlns="urn:ehd/go/001").
If you use the full namespace identifiers rather than the prefixes (which is also less likely to break if the feed changes), you should be able to access these easily:
$xml = simplexml_load_file($file) or die("Failed to load");
$ehd = $xml->children('urn:ehd/001')->body;
$gnr_liste = $ehd->children('urn:ehd/go/001')->gnr_liste;
foreach ( $gnr_liste->gnr as $gnr ) {
simplexml_dump($gnr);
}
You might want to give your own names to the namespaces so you don't have to use the full URIs, but aren't dependent on the prefixes the XML is generated with; a common approach is to define constants:
const XMLNS_EHD_MAIN = 'urn:ehd/001';
const XMLNS_EHD_GNR = 'urn:ehd/go/001';
$xml = simplexml_load_file($file) or die("Failed to load");
$ehd = $xml->children(XMLNS_EHD_MAIN)->body;
$gnr_liste = $ehd->children(XMLNS_EHD_GNR)->gnr_liste;
foreach ( $gnr_liste->gnr as $gnr ) {
simplexml_dump($gnr);
}
Personally, I find DomDocument much more intuitive to work with – once you get over the barrier of XPath syntax. No matter what tool you use, namespaces are going to make everything more difficult though!
$xml = <<< XML
<?xml version="1.0" encoding="ISO-8859-15"?>
<ehd:ehd ehd_version="1.40" xmlns:ehd="urn:ehd/001" xmlns="urn:ehd/go/001">
<ehd:header>
</ehd:header>
<ehd:body>
<gnr_liste>
<gnr V="01100"></gnr>
<gnr V="01101"></gnr>
<gnr V="01102"></gnr>
</gnr_liste>
</ehd:body>
</ehd:ehd>
XML;
$dom = new DomDocument;
$dom->loadXML($xml);
$xp = new DomXPath($dom);
// need to get tricky due to namespaces https://stackoverflow.com/a/16719351/1255289
$nodes = $xp->query("//*[local-name()='gnr']/#V");
foreach ($nodes as $node) {
printf("%s\n", $node->value);
}
Output:
01100
01101
01102
I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:ooo="http://openoffice.org/2004/office" office:version="1.1">
<office:meta>
<dc:creator>Mark Baker</dc:creator>
<dc:date>2010-09-01T22:49:33Z</dc:date>
<meta:creation-date>2010-09-01T22:48:39Z</meta:creation-date>
<meta:editing-cycles>4</meta:editing-cycles>
<meta:editing-duration>PT00H04M20S</meta:editing-duration>
<meta:generator>OpenOffice.org/3.1$Win32 OpenOffice.org_project/310m11$Build-9399</meta:generator>
</office:meta>
</office:document-meta>
</gnm:Workbook>
And am trying to read the office:document-meta node to extractthe various elements below it (dc:creator, meta:creation-date, etc.)
The following code:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
var_dump($officeXML);
echo '<hr />';
gives me:
object(SimpleXMLElement)[91]
public 'document-meta' =>
object(SimpleXMLElement)[93]
public '#attributes' =>
array
'version' => string '1.1' (length=3)
public 'meta' =>
object(SimpleXMLElement)[94]
but if I try to read the document-meta element using:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
$docMeta = $officeXML->document-meta;
var_dump($docMeta);
echo '<hr />';
I get
Notice: Use of undefined constant meta - assumed 'meta' in /usr/local/apache/htdocsNewDev/PHPExcel/Classes/PHPExcel/Reader/Gnumeric.php on line 273
int 0
I assume that SimpleXML is trying to extract a non-existent node "document" from $officeXML, then subtract the value of (non-existent) constant "meta", resulting in forcing the integer 0 result rather than the document-meta node.
Is there a way to resolve this using SimpleXML, or will I be forced to rewrite using XMLReader? Any help appreciated.
Your assumption is correct. Use
$officeXML->{'document-meta'}
to make it work.
Please note that the above applies to Element nodes. Attribute nodes (those within the #attributes property when dumping the SimpleXmlElement) do not require any special syntax to be accessed when hyphenated. They are regularly accessible via array notation, e.g.
$xml = <<< XML
<root>
<hyphenated-element hyphenated-attribute="bar">foo</hyphenated-element>
</root>
XML;
$root = new SimpleXMLElement($xml);
echo $root->{'hyphenated-element'}; // prints "foo"
echo $root->{'hyphenated-element'}['hyphenated-attribute']; // prints "bar"
See the SimpleXml Basics in the Manual for further examples.
I assume the best way to do it is to cast to array:
Consider the following XML:
<subscribe hello-world="yolo">
<callback-url>example url</callback-url>
</subscribe>
You can access members, including attributes, using a cast:
<?php
$xml = (array) simplexml_load_string($input);
$callback = $xml["callback-url"];
$attribute = $xml['#attributes']['hello-world'];
It makes everything easier. Hope I helped.
I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:ooo="http://openoffice.org/2004/office" office:version="1.1">
<office:meta>
<dc:creator>Mark Baker</dc:creator>
<dc:date>2010-09-01T22:49:33Z</dc:date>
<meta:creation-date>2010-09-01T22:48:39Z</meta:creation-date>
<meta:editing-cycles>4</meta:editing-cycles>
<meta:editing-duration>PT00H04M20S</meta:editing-duration>
<meta:generator>OpenOffice.org/3.1$Win32 OpenOffice.org_project/310m11$Build-9399</meta:generator>
</office:meta>
</office:document-meta>
</gnm:Workbook>
And am trying to read the office:document-meta node to extractthe various elements below it (dc:creator, meta:creation-date, etc.)
The following code:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
var_dump($officeXML);
echo '<hr />';
gives me:
object(SimpleXMLElement)[91]
public 'document-meta' =>
object(SimpleXMLElement)[93]
public '#attributes' =>
array
'version' => string '1.1' (length=3)
public 'meta' =>
object(SimpleXMLElement)[94]
but if I try to read the document-meta element using:
$xml = simplexml_load_string($gFileData);
$namespacesMeta = $xml->getNamespaces(true);
$officeXML = $xml->children($namespacesMeta['office']);
$docMeta = $officeXML->document-meta;
var_dump($docMeta);
echo '<hr />';
I get
Notice: Use of undefined constant meta - assumed 'meta' in /usr/local/apache/htdocsNewDev/PHPExcel/Classes/PHPExcel/Reader/Gnumeric.php on line 273
int 0
I assume that SimpleXML is trying to extract a non-existent node "document" from $officeXML, then subtract the value of (non-existent) constant "meta", resulting in forcing the integer 0 result rather than the document-meta node.
Is there a way to resolve this using SimpleXML, or will I be forced to rewrite using XMLReader? Any help appreciated.
Your assumption is correct. Use
$officeXML->{'document-meta'}
to make it work.
Please note that the above applies to Element nodes. Attribute nodes (those within the #attributes property when dumping the SimpleXmlElement) do not require any special syntax to be accessed when hyphenated. They are regularly accessible via array notation, e.g.
$xml = <<< XML
<root>
<hyphenated-element hyphenated-attribute="bar">foo</hyphenated-element>
</root>
XML;
$root = new SimpleXMLElement($xml);
echo $root->{'hyphenated-element'}; // prints "foo"
echo $root->{'hyphenated-element'}['hyphenated-attribute']; // prints "bar"
See the SimpleXml Basics in the Manual for further examples.
I assume the best way to do it is to cast to array:
Consider the following XML:
<subscribe hello-world="yolo">
<callback-url>example url</callback-url>
</subscribe>
You can access members, including attributes, using a cast:
<?php
$xml = (array) simplexml_load_string($input);
$callback = $xml["callback-url"];
$attribute = $xml['#attributes']['hello-world'];
It makes everything easier. Hope I helped.