XML Namespaces with PHP's xmlwriter - php

Firstly can you tell me whether this xml:
<adf:source xsi:schemaLocation="http://www.rightmove.co.uk/adf/rightmoveV4n.xsd rightmoveV4n.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:adf="http://www.rightmove.co.uk/adf/rightmoveV4n.xsd">
</source>
Is correct? I can't see how the document starts with: <adf: source> and closes with </source>, doesn't seem right to me?
I have replicated the structure using my own data but cannot get PHP's XMLWriter() to close the document with just </source> - it closes it with </adf:source>.
I'm doing:
$xml = new XMLWriter();
$xml->openMemory();
$xml->startDocument();
$xml->startElementNS("adf", "source", "http://www.rightmove.co.uk/adf/rightmoveV4n.xsd");
$xml->writeAttributeNS ("xsi", "schemaLocation", "http://www.w3.org/2001/XMLSchema-instance", "http://www.rightmove.co.uk/adf/rightmoveV4n.xsd rightmoveV4n.xsd");
and then eventually
$xml->endElement ();
echo $xml->outputMemory();

No, your XML is not well-formed. The root node of an XML document must be opened and closed with the same element. As far as an XML parser is concerned, <adf:source> and <source> are entirely different.
The adf: in front of the source element is a so-called namespace prefix, which is like a shorthand way of saying: "This element belongs to the namespace http://www.rightmove.co.uk/adf/rightmoveV4n.xsd".
So, the behaviour of XMLWriter() is to be expected and perfectly fine. On the other hand, an application that produces the XML document you have shown is clearly in error.

Related

How to echo an PHP SimpleXMLElement

I am learning SimpleXML in PHP. Then I am doing simple test with SimpleXMLElement(...), I dont get anything back. Let me explain. Here is XML file:
<?xml version="1.0" encoding="UTF-8"?>
<movies>
<movie>
<title>PHP: Behind the Parser</title>
<plot>
So, this language. It's like, a programming language. Or is it a
scripting language? All is revealed in this thrilling horror spoof
of a documentary.
</plot>
<great-lines>
<line>PHP solves all my web problems</line>
</great-lines>
<rating type="thumbs">7</rating>
<rating type="stars">5</rating>
</movie>
</movies>
And here is my php file:
<?php
$xml = simplexml_load_file('example.xml');
echo $xml->getName() . "<br>"; // prints "movies"
$movies = new SimpleXMLElement($xml);
echo $movies->getName() . "...<br>"; // doesnt print anything, not event dots
echo $movies->movie[0]->plot; // even this does not print anything
?>
Only output is:
movies
Please read the comments in php file. I am trying to print xml elements in exact same way after loading file and after doing new simpleXML object. Some how it prints only first echo command results. I searched many examples and could not make it work. Where is the mistake? It is big puzzle for me, but maybe a tiny one for you.
simplexml_load_file already returns your SimpleXMLElement object. Try this:
<?php
$xml = simplexml_load_file('example.xml');
echo $xml->getName() . "<br>";
echo $xml->movie[0]->plot . "<br>\n";
?>
change this line:
$movies = new SimpleXMLElement($xml);
to this:
$movies = new SimpleXMLElement($xml->asXML());
What you are trying to do doesn't make much sense, because you are trying to load the same XML twice:
// this loads the XML from a file, giving you a SimpleXMLElement object:
$xml = simplexml_load_file('example.xml');
// this line would do what? load the XML from the XML?
$movies = new SimpleXMLElement($xml);
There are two functions for loading XML in the SimpleXML extension, both return SimpleXMLElement objects:
simplexml_load_file - takes a filename, and loads the XML in that file; with the right PHP settings, you can also give it a URL, and it will load the XML straight from there
simplexml_load_string - takes a string of XML that you've already got from somewhere else, and loads that
The third way of getting a SimpleXMLElement is calling the class's constructor (i.e. writing new SimpleXMLElement). This can actually act like either of the above: by default, it expects a string of XML (like simplexml_load_string), but you can also set the 3rd parameter to true to say that it's a path or URL (like simplexml_load_file).
The result of all three of these methods is exactly the same, they're just different ways of getting there depending on what you currently have (and, to some extent, how you want your code to look).
As a side-note, there are two more functions which do take an object of XML you've already parsed: simplexml_import_dom and dom_import_simplexml. These are actually pretty cool, because the DOM is a standard, comprehensive, but rather fiddly and verbose way of acting on XML, whereas SimpleXML is, well, simple - and using these functions you can actually use both with very little penalty, because they just change the wrapper of the object without having to re-parse the underlying XML.
try this
<?php
$movies = simplexml_load_file('sample.xml');
foreach($movies as $key=>$val)
{
echo $val->title.'<br>';
echo $val->plot.'<br>';
echo $val->rating[0];
echo $val->rating[1];
}
?>

SimpleXML parent - child issue

i am having an issue with parsing an XML file using SimpleXML and PHP.
The XML file in question is provided by a third party and includes a number of child elements (going down multiple levels) within it. I know which elements i require and can see them within the XML file, but i just can't seem to get them to print using PHP.
Example XML feed for test.xml:
<?xml version="1.0" encoding="utf-8"?>
<Element1 xmlns="" release="8.1" environment="Production" lang="en-US">
<Element2>
<Element3>
<Element4>
<Element5>it worked</Element5>
</Element4>
</Element3>
</Element2>
</Element1>
The file only includes one of each attribute so i can be very particular with the request, the code i have so far is below:
$lib=simplexml_load_file("test.xml");
$make=$lib->Element1->Element2->Element3->Element4->Element5;
print $make;
I have tried to look this up before asking, but the only solutions i can see are when the child attributes are unknown or there are multiple results for each request, which is not the case in this instance.
Any help or guidance would be greatly received.
Thanks
In your code above, $lib is Element1. So you just need to drop one of your references. This:
$make=$lib->Element1->Element2->Element3->Element4->Element5;
Should become this:
$make=$lib->Element2->Element3->Element4->Element5;
Also, SimpleXML is an awful awful awful awful interface (considering that "Simple" is in the name and there is mass confusion about how to use it). I would always recommend DOMDocument instead.
I'd strongly recommend using xpath as it will give you more flexibility e.g. Allow you to restrict results based on xml node attributes.
$xml = simplexml_load_string('<?xml version="1.0" encoding="utf-8"?>
<Element1 xmlns="" release="8.1" environment="Production" lang="en-US">
<Element2>
<Element3>
<Element4>
<Element5>it worked</Element5>
</Element4>
</Element3>
</Element2>
</Element1>');
$data=$xml->xpath('/Element1/Element2/Element3/Element4/Element5');
echo (string)$data[0]; //outputs 'it worked'
//this also works
$data=$xml->xpath('//Element5');
echo (string)$data[0]; //outputs 'it worked'

Can't access XML node via xpath() (YT channel feed)

Very stumped by this one. In PHP, I'm fetching a YouTube user's vids feed and trying to access the nodes, like so:
$url = 'http://gdata.youtube.com/feeds/api/users/HCAFCOfficial/uploads';
$xml = simplexml_load_file($url);
So far, so fine. Really basic stuff. I can see the data comes back by running:
echo '<p>Found '.count($xml->xpath('*')).' nodes.</p>'; //41
echo '<textarea>';print_r($xml);echo '</textarea>';
Both print what I would expect, and the print_r replicates the XML structure.
However, I have no idea why this is returning zero:
echo '<p>Found '.count($xml->xpath('entry')).'"entry" nodes.</p>';
There blatantly are entry nodes in the XML. This is confirmed by running:
foreach($xml->xpath('*') as $node) echo '<p>['.$node->getName().']</p>';
...which duly outputs "[entry]" 25 times. So perhaps this is a bug in SimpleXML? This is part of a wider feed caching system and I'm not having any trouble with other, non-YT feeds, only YT ones.
[UPDATE]
This question shows that it works if you do
count($xml->entry)
But I'm curious as to why count($xml->xpath('entry')) doesn't also work...
[Update 2]
I can happily traverse YT's anternate feed format just fine:
http://gdata.youtube.com/feeds/base/users/{user id}/uploads?alt=rss&v=2
This is happening because the feed is an Atom document with a defined default namespace.
<feed xmlns="http://www.w3.org/2005/Atom" ...
Since a namespace is defined, you have to define it for your xpath call too. Doing something like this works:
$url = 'http://gdata.youtube.com/feeds/api/users/HCAFCOfficial/uploads';
$xml = simplexml_load_file($url);
$xml->registerXPathNamespace('ns', 'http://www.w3.org/2005/Atom');
$results = $xml->xpath('ns:entry');
echo count($results);
The main thing to know here is that SimpleXML respects any and all defined namespaces and you need to handle them accordingly, including the default namespace. You'll notice that the second feed you listed does not define a default namespace and so the xpath call works fine as is.

Parsing Zimbra SOAP response with SimpleXML and xpath

So, I'm using PHP to talk to a Zimbra SOAP server. The response is in a <soap:Envelope> tag. I'm having trouble parsing the XML response because of the namespace(s).
The XML looks like this:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<context xmlns="urn:zimbra">
<change token="20333"/>
</context>
</soap:Header>
<soap:Body>
<CreateAccountResponse xmlns="urn:zimbraAdmin">
<account id="83ebf344-dc51-47ae-9a36-3eb24281d53e" name="iamtesting#example.com">
<a n="zimbraId">83ebf344-dc51-47ae-9a36-3eb24281d53e</a>
<a n="zimbraMailDeliveryAddress">iamtesting#example.com</a>
</account>
</CreateAccountResponse>
</soap:Body>
</soap:Envelope>
I make a new SimpleXMLElement object:
$xml = new SimpleXMLElement($data);
After Googling a bit, I found I need to register the namespace. So I do that:
$xml->registerXPathNamespace('soap', 'http://www.w3.org/2003/05/soap-envelope');
Then I can get the <soap:Body> tag easily.
$body = $xml->xpath('//soap:Body');
But I can't get any elements after that (using xpath):
$CreateAccountResponse = $xml->xpath('//soap:Body/CreateAccountResponse');
This returns an empty array. I can traverse the XML though, to get that element.
$CreateAccountResponse = $body[0]->CreateAccountResponse;
This works fine, but now I want to get the <a> tags, specifically the zimbraId one. So I tried this:
$zimbraId = $CreateAccountResponse->account->xpath('a[#n=zimbraId]');
No luck, I get a blank array. What's going on? Why can't I use xpath to get elements (that don't start with soap:)?
How can I get the <a> tags based on their n attribute?
P.S. I'm aware that the id and name are also in the <account> tag's attributes, but there are a bunch more <a> tags that I want to get using the n attribute.
Note: I'm trying to improve the Zimbra library for my application for work. The current code to get the <a> tags is as follows:
$zimbraId = strstr($data, "<a n=\"zimbraId\"");
$zimbraId = strstr($zimbraId, ">");
$zimbraId = substr($zimbraId, 1, strpos($zimbraId, "<") - 1);
Obviously, I want to remove this code (there's also some regexes (shudder) later on in the code), and use an XML parser.
The elements you want to retrieve have a namespace as well, namely urn:zimbraAdmin.
<CreateAccountResponse xmlns="urn:zimbraAdmin">
The xmlns attribute states the default namespace for any child elements, so the elements you are trying to retrieve actually have a namespace, even though no prefix is used (see the wikipedia article for some examples). If you specify a namespace prefix as you did for http://www.w3.org/2003/05/soap-envelope you should be fine.
$xml->registerXPathNamespace('soap', 'http://www.w3.org/2003/05/soap-envelope');
$xml->registerXPathNamespace('zimbra', 'urn:zimbraAdmin');
$CreateAccountResponse = $xml->xpath('//soap:Body/zimbra:CreateAccountResponse');

XML validation against given DTD in PHP

In PHP, I am trying to validate an XML document using a DTD specified by my application - not by the externally fetched XML document. The validate method in the DOMDocument class seems to only validate using the DTD specified by the XML document itself, so this will not work.
Can this be done, and how, or do I have to translate my DTD to an XML schema so I can use the schemaValidate method?
(this seems to have been asked in Validate XML using a custom DTD in PHP but without correct answer, since the solution only relies on DTD speicified by the target XML)
Note: XML validation could be subject to the Billion Laughs attack, and similar DoS vectors.
This essentially does what rojoca mentioned in his comment:
<?php
$xml = <<<END
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo SYSTEM "foo.dtd">
<foo>
<bar>baz</bar>
</foo>
END;
$root = 'foo';
$old = new DOMDocument;
$old->loadXML($xml);
$creator = new DOMImplementation;
$doctype = $creator->createDocumentType($root, null, 'bar.dtd');
$new = $creator->createDocument(null, null, $doctype);
$new->encoding = "utf-8";
$oldNode = $old->getElementsByTagName($root)->item(0);
$newNode = $new->importNode($oldNode, true);
$new->appendChild($newNode);
$new->validate();
?>
This will validate the document against the bar.dtd.
You can't just call $new->loadXML(), because that would just set the DTD to the original, and the doctype property of a DOMDocument object is read-only, so you have to copy the root node (with everything in it) to a new DOM document.
I only just had a go with this myself, so I'm not entirely sure if this covers everything, but it definitely works for the XML in my example.
Of course, the quick-and-dirty solution would be to first get the XML as a string, search and replace the original DTD by your own DTD and then load it.
I think that's only possible with XSD, see:
http://php.net/manual/en/domdocument.schemavalidate#62032

Categories