Xpath in PHP with OTA standards - php

I have basic knowledge about the use of Xpath in PHP, but I'm having some troubles with a specific case and I think that the problem is in the standards.
This is the snippet of the XML and it's based on the OTA standards:
<SendHotelResResult xmlns:a="http://schemas/Models/OTA" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:RoomRates>
<a:RoomRate>
<a:EffectiveDate>2015-11-13T00:00:00</a:EffectiveDate>
<a:ExpireDate>2015-11-15T00:00:00</a:ExpireDate>
<a:RatePlanID>25</a:RatePlanID>
<a:RatesType>
<a:Rates>
<a:Rate>
<a:AgeQualifyingCode i:nil="true"/>
<a:EffectiveDate>2015-11-13T00:00:00</a:EffectiveDate>
<a:Total>
<a:AmountAfterTax>0</a:AmountAfterTax>
<a:AmountBeforeTax>260.00</a:AmountBeforeTax>
<a:CurrencyCode>EUR</a:CurrencyCode>
</a:Total>
</a:Rate>
<a:Rate>
<a:AgeQualifyingCode i:nil="true"/>
<a:EffectiveDate>2015-11-14T00:00:00</a:EffectiveDate>
<a:Total>
<a:AmountAfterTax>0</a:AmountAfterTax>
<a:AmountBeforeTax>260.00</a:AmountBeforeTax>
<a:CurrencyCode>EUR</a:CurrencyCode>
</a:Total>
</a:Rate>
</a:Rates>
</a:RatesType>
<a:RoomID>52</a:RoomID>
<a:Total>
<a:AmountAfterTax>546.00</a:AmountAfterTax>
<a:AmountBeforeTax>520.00</a:AmountBeforeTax>
<a:CurrencyCode>EUR</a:CurrencyCode>
</a:Total>
</a:RoomRate>
</a:RoomRates>
</SendHotelRes>
What I want:
Get a specific <RoomRate> tag based on the element <RoomID>.
Get the global RoomRate <Total> tag. I don't want the <Total> tag that is inside the <Rate> tag. This is the reason why I'm using the xpath rather than a simple getElementsByTagName('Total'). I don't know if the OTA standards has some approach to differentiate the Total tags.
My attempts until now:
$dom = new DOMDocument();
$response = $dom->load($xmlSendHotelRes);
$roomID = '52';
$roomRatesTag = $response->getElementsByTagName('RoomRates')->item(0);
$prefix = $roomRatesTag->prefix;
$namespace = $roomRatesTag->lookupNamespaceURI($prefix);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace($prefix, $namespace);
$roomRateTotal = $xpath->query("//RoomRate[RoomID=$roomID]/Total", $roomRatesTag, true);
I already tried with and without $roomRatesTag as context and also other expressions like:
./RoomRate[RoomID=$roomID]/Total, //RoomRate[RoomID=$roomID]/Total, //RoomRate/[RoomID=$roomID]/Total,//RoomRate[RoomID=$roomID]/Total and //RoomRate/RoomID[text() = $roomID]/../Total but any of them works.
Actually, even $roomRate = $xpath->query("//RoomRate"); returns a empty DOMNodeList, so, I don't know what I doing wrong and I'm thinking about the problem in the standards with 2 identical tags in different places, although this not make much sense.
Are there some other expressions that I need to try?

You're fetching the namespace from the document.
$prefix = $roomRatesTag->prefix;
$namespace = $roomRatesTag->lookupNamespaceURI($prefix);
But this is not necessary or a good idea. You know that the document uses OTA, so you know the namespace is http://schemas/Models/OTA.
The prefix is just an alias for the actual namespace value the following 3 XML example all resolve to a node {http://schemas/Models/OTA}RoomRates
<a:RoomRates xmlns:a="http://schemas/Models/OTA"/>
<ota:RoomRates xmlns:ota="http://schemas/Models/OTA"/>
<RoomRates xmlns="http://schemas/Models/OTA"/>
Your Api has to look for nodes inside the namespace.
One possibility is to use the *NS (namespace aware) methods.
$response->getElementsByTagNameNS('http://schemas/Models/OTA', 'RoomRates')->item(0);
The other is to use Xpath and register prefixes for the namespaces. This can be the prefixes from the document, or different ones.
$document = new DOMDocument();
$document->load($xmlSendHotelRes);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('ota', 'http://schemas/Models/OTA');
var_dump(
$xpath->evaluate(
'string(//ota:RoomRates/ota:RoomRate[ota:RoomID=$roomID]/ota:Total)')
)
);
For a location path, DOMXpath::evaluate() would return a DOMNodeList but with string() it casts the first found node into a string and returns it.

You need to use a prefix (that you registered) and I think you want to start your path with .// and not with // if you want to search relative to the context node, so try ".//a:RoomRate[a:RoomID=$roomID]/a:Total"

Related

Get all siblings-nodes where node equals... with xpath and namespace

I am using xpath to find content from a epg-file, but for this source, my code simply won't work. And now i have come to the point that i cant solve this myself.
The XML looks like this (as you see, 2 namespaces, v3 and v31).
<?xml version="1.0" encoding="UTF-8"?>
<v3:schedule timestamp="2017-05-12T16:11:06.595Z" xmlns:v3="http://common.tv.se/schedule/v3_1">
<v3:from>2017-05-12T22:00:00.000Z</v3:from>
<v3:to>2017-05-13T22:00:00.000Z</v3:to>
...
<v3:contentList>
<v31:content timestamp="2017-05-12T16:11:06.595Z" xmlns:v31="http://common.tv.se/content/v3_1">
<v31:contentId>content.1375706-006</v31:contentId>
<v31:seriesId>series.40542</v31:seriesId>
<v31:seasonNumber>3</v31:seasonNumber>
<v31:episodeNumber>6</v31:episodeNumber>
<v31:numberOfEpisodes>8</v31:numberOfEpisodes>
<v31:productionYear>2017</v31:productionYear>
...
<v3:eventList>
<v31:event timestamp="2017-05-12T16:11:06.595Z" xmlns:v31="http://common.tv.se/event/v3_1">
<v31:eventId>event.26072881</v31:eventId>
<v31:channelId>channel.24</v31:channelId>
<v31:rerun>true</v31:rerun>
<v31:live>false</v31:live>
<v31:hidden>false</v31:hidden>
<v31:description/>
<v31:timeList>
<v31:time type="public">
<v31:startTime>2017-05-12T22:55:00.000Z</v31:startTime>
<v31:endTime>2017-05-12T23:55:00.000Z</v31:endTime>
<v31:duration>01:00:00:00</v31:duration>
</v31:time>
</v31:timeList>
<v31:contentIdRef>content.1375706-006</v31:contentIdRef>
<v31:materialIdRef>material.1010161108005267221</v31:materialIdRef>
<v31:previousEventList/>
<v31:comingEventList/>
</v31:event>
...
<v3:materialList>
<v31:material timestamp="2017-05-12T16:11:06.595Z" xmlns:v31="http://common.tv.se/material/v3_1">
<v31:materialId>material.1010161108005267221</v31:materialId>
<v31:contentIdRef>content.1375706-006</v31:contentIdRef>
<v31:materialType>tx</v31:materialType>
<v31:videoFormat>576i</v31:videoFormat>
<v31:audioList>
<v31:format language="unknown">stereo</v31:format>
</v31:audioList>
<v31:aspectRatio>16:9</v31:aspectRatio>
<v31:materialReferenceList>
</v31:materialReferenceList>
</v31:material>
...
And the "contentIdRef" is what keeps the different elements (event and material) together.
And i want to find all the data, based on contentIdRef.
I have used this (in php):
$parent = $this->xmldata->xpath('//v31:event/v31:contentIdRef[.="content.1375706-006"]/parent::*')
and i have also tried
$parent = $this->xmldata->xpath('//v31:event/v31:contentIdRef[.="content.1375706-006"]/parent::*/child::*');
But, the first alternative just (with print_r) returns v31:event "timestamp"
the second alternative returns 11 "simpleXMLobjects" that are empty ( why are they empty?? ), so based on the amount of objects, i think i have "hit the spot", but i can't find out why they are empty....
And yes, i have registered namespaces throughout my code ( i wish it was that simple ).
TLDR;
I want to 1. get all contentIds from first block (v3:contentList),
2. get all eventdata for each contentId,
3. get all materialdata for each content id...
I sincerely hope you can help :/
Did you register prefixes for the namespaces in the Xpath expressions? Always register your own prefixes for the namespaces you're using. PHP registers the namespace definition of the current context node by default. But this can change on any element node in the document and not all prefixes might be defined on the document element.
$schedule = new SimpleXMLElement($xml);
$schedule->registerXpathNamespace('s', 'http://common.tv.se/schedule/v3_1');
$schedule->registerXpathNamespace('e', 'http://common.tv.se/event/v3_1');
$events = $schedule->xpath(
'//e:event[e:contentIdRef = "content.1375706-006"]'
);
foreach ($events as $event) {
echo $event->asXml(), "\n\n";
}
Or with DOM:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('s', 'http://common.tv.se/schedule/v3_1');
$xpath->registerNamespace('e', 'http://common.tv.se/event/v3_1');
$events = $xpath->evaluate('//e:event');
foreach ($events as $event) {
echo $document->saveXml($event), "\n\n";
}

Is it possible to get an attribute's value and the text within a node at the same time in XPath 1.0?

I've tried the solution here: Getting attribute using XPath
but it gives me an error.
I have some XHTML like this:
Click me!
I'm recursively parsing the XML and trying to get both the href attribute (link.php) and the link text (Click me!) at the same time.
<?php
$node = $xpath->query('string(self::a/#href) | self::a/text()', $nodes->item(0));
This code throws the following error:
Warning: DOMXPath::query(): Invalid type
If I do either of these two separately they work, but not together:
<?php
$node = $xpath->evaluate('string(self::a/#href)', $nodes->item(0));
$node = $xpath->query('self::a/text()', $nodes->item(0));
If I use the following I get the whole attribute (href="link.php"), not just its value:
<?php
$node = $xpath->query('self::a/#href | self::a/text()', $nodes->item(0));
Is there any way of getting both text values at the same time using XPath 1.0 in PHP?
As suggested by others, you can use concat() (and PHP XPath supports it! see the demo below) to combine value of attribute and content of an element.
The problem with others' suggested XPath probably was, judging from your attempted code i.e the use of self::a, that the context node ($nodes->item(0)) is already the <a> element, so that a/#href relative to current context node means return href attribute of child element a of current element, that's why you got no match. You were correct by using self::a in this case or, alternatively, just . which can be used to reference current context node :
$doc = new DOMDocument();
$xml = <<<XML
<root>
Click me!
</root>
XML;
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//a');
$node = $xpath->evaluate('concat(#href, "|", .)', $nodes->item(0));
echo $node;
eval.in demo
output :
link.php|Click me!

Weird SimpleXML issue - can't reference nodes by name?

I'm trying to parse a remote XML file, which is valid:
$xml = simplexml_load_file('http://feeds.feedburner.com/HammersInTheHeart?format=xml');
The root element is feed, and I'm trying to grab it via:
$nodes = $xml->xpath('/feed'); //also tried 'feed', without slash
Except it doesn't find any nodes.
print_r($nodes); //empty array
Or any nodes of any kind, so long as I search for them by tag name, in fact:
$nodes = $xml->xpath('//entry');
print_r($nodes); //empty array
It does find nodes, however, if I use wildcards, e.g.
$nodes = $xml->xpath('/*/*[4]');
print_r($nodes); //node found
What's going on?
Unlike DOM, SimpleXML has no concept of a document object, only elements. So if you load an XML you always get the document element.
$feed = simplexml_load_file($xmlFile);
var_dump($feed->getName());
Output:
string(4) "feed"
That means that all Xpath expression have to to be relative to this element or absolute. Simple feed will not work because the context already is the feed element.
But here is another reason. The URL is an Atom feed. So the XML elements in the namespace http://www.w3.org/2005/Atom. SimpleXMLs magic syntax recognizes a default namespace for some calls - but Xpath does not. Here is not default namespace in Xpath. You will have to register them with a prefix and use that prefix in your Xpath expressions.
$feed = simplexml_load_file($xmlFile);
$feed->registerXpathNamespace('a', 'http://www.w3.org/2005/Atom');
foreach ($feed->xpath('/a:feed/a:entry[position() < 3]') as $entry) {
var_dump((string)$entry->title);
}
Output:
string(24) "Sharing the goals around"
string(34) "Kouyate inspires Hammers' comeback"
However in SimpleXML the registration has to be done for each object you call the xpath() method on.
Using Xpath with DOM is slightly different but a lot more powerful.
$document = new DOMDocument();
$document->load($xmlFile);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('a', 'http://www.w3.org/2005/Atom');
foreach ($xpath->evaluate('/a:feed/a:entry[position() < 3]') as $entry) {
var_dump($xpath->evaluate('string(a:title)', $entry));
}
Output:
string(24) "Sharing the goals around"
string(34) "Kouyate inspires Hammers' comeback"
Xpath expression using with DOMXpath::evaluate() can return scalar values.

XML Xpath Failing on getElementsByTagName

<?xml version="1.0" encoding="UTF-8"?>
<AddProduct>
<auth><id>vendor123</id><auth_code>abc123</auth_code></auth>
</AddProduct>
What am I doing wrong to get : Fatal error: Call to undefined method DOMNodeList::getElementsByTagName()
$xml = $_GET['xmlRequest'];
$dom = new DOMDocument();
#$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
$auth = $xpath->query('*/auth');
$id = $auth->getElementsByTagName('id')->item(0)->nodeValue;
$code = $auth->getElementsByTagName('auth_code')->item(0)->nodeValue;
You could retrieve the data (in the XML you posted) you want using XPath only:
$id = $xpath->query('//auth/id')->item(0)->nodeValue;
$code = $xpath->query('//auth/auth_code')->item(0)->nodeValue;
You are also calling getElementsByTagName() on $auth (DOMXPath), as #Ohgodwhy pointed out in the comments, which is causing the error. If you want to use it, you should call it on $dom.
Your XPath expression returns the auth child of the current (context) node. Unless your XML file is different, it's clearer to use one of:
/*/auth # returns auth nodes two levels below root
/AddProduct/auth # returns auth nodes in below /AddProduct
//auth # returns all auth nodes
This is what I came up with after reviewing php's documentation (http://us1.php.net/manual/en/class.domdocument.php, http://us1.php.net/manual/en/domdocument.loadxml.php, http://us3.php.net/manual/en/domxpath.query.php, http://us3.php.net/domxpath)
$dom = new DOMDocument();
$dom->loadXML($xml);
$id = $dom->getElementsByTagName("id")->item(0)->nodeValue;
$code = $dom->getElementsByTagName("auth_code")->item(0)->nodeValue;
As helderdarocha and Ohgodwhy pointed out, the getElementByTagName is a DOMDocument method not a DOMXPath method. I like helderdarocha's solution that only uses XPath, the solution I posted accomplishes the same thing but only uses the DOMDocument.

In DomDocument, reuse of DOMXpath, it is stable?

I am using the function below, but not sure about it is always stable/secure... Is it?
When and who is stable/secure to "reuse parts of the DOMXpath preparing procedures"?
To simlify the use of the XPath query() method we can adopt a function that memorizes the last calls with static variables,
function DOMXpath_reuser($file) {
static $doc=NULL;
static $docName='';
static $xp=NULL;
if (!$doc)
$doc = new DOMDocument();
if ($file!=$docName) {
$doc->loadHTMLFile($file);
$xp = NULL;
}
if (!$xp)
$xp = new DOMXpath($doc);
return $xp; // ??RETURNED VALUES ARE ALWAYS STABLE??
}
The present question is similar to this other one about XSLTProcessor reuse.
In both questions the problem can be generalized for any language or framework that use LibXML2 as DomDocument implementation.
There are another related question: How to "refresh" DOMDocument instances of LibXML2?
Illustrating
The reuse is very commom (examples):
$f = "my_XML_file.xml";
$elements = DOMXpath_reuser($f)->query("//*[#id]");
// use elements to get information
$elements = DOMXpath_reuser($f)->("/html/body/div[1]");
// use elements to get information
But, if you do something like removeChild, replaceChild, etc. (example),
$div = DOMXpath_reuser($f)->query("/html/body/div[1]")->item(0); //STABLE
$div->parentNode->removeChild($div); // CHANGES DOM
$elements = DOMXpath_reuser($f)->query("//div[#id]"); // INSTABLE! !!
extrange things can be occur, and the queries not works as expected!!
When (what DOMDocument methods affect XPath?)
Why we can not use something like normalizeDocument to "refresh DOM" (exist?)?
Only a "new DOMXpath($doc);" is allways secure? need to reload $doc also?
DOMXpath is affected by the load*() methods on DOMDocument. After loading a new xml or html, you need to recreate the DOMXpath instance:
$xml = '<xml/>';
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
var_dump($xpath->document === $dom); // bool(true)
$dom->loadXml($xml);
var_dump($xpath->document === $dom); // bool(false)
In DOMXpath_reuser() you store a static variable and recreate the xpath depending on the file name. If you want to reuse an Xpath object, suggest extending DOMDocument. This way you only need pass the $dom variable around. It would work with a stored xml file as well with xml string or a document your are creating.
The following class extends DOMDocument with an method xpath() that always returns a valid DOMXpath instance for it. It stores and registers the namespaces, too:
class MyDOMDocument
extends DOMDocument {
private $_xpath = NULL;
private $_namespaces = array();
public function xpath() {
// if the xpath instance is missing or not attached to the document
if (is_null($this->_xpath) || $this->_xpath->document != $this) {
// create a new one
$this->_xpath = new DOMXpath($this);
// and register the namespaces for it
foreach ($this->_namespaces as $prefix => $namespace) {
$this->_xpath->registerNamespace($prefix, $namespace);
}
}
return $this->_xpath;
}
public function registerNamespaces(array $namespaces) {
$this->_namespaces = array_merge($this->_namespaces, $namespaces);
if (isset($this->_xpath)) {
foreach ($namespaces as $prefix => $namespace) {
$this->_xpath->registerNamespace($prefix, $namespace);
}
}
}
}
$xml = <<<'ATOM'
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Test</title>
</feed>
ATOM;
$dom = new MyDOMDocument();
$dom->registerNamespaces(
array(
'atom' => 'http://www.w3.org/2005/Atom'
)
);
$dom->loadXml($xml);
// created, first access
var_dump($dom->xpath()->evaluate('string(/atom:feed/atom:title)', NULL, FALSE));
$dom->loadXml($xml);
// recreated, connection was lost
var_dump($dom->xpath()->evaluate('string(/atom:feed/atom:title)', NULL, FALSE));
The DOMXpath class (instead of XSLTProcessor in your another question) use reference to given DOMDocument object in contructor. DOMXpath create libxml context object based on given DOMDocument and save it to internal class data. Besides libxml context its saves references to originalDOMDocument` given in contructor arguments.
What that means:
Part of sample from ThomasWeinert answer:
var_dump($xpath->document === $dom); // bool(true)
$dom->loadXml($xml);
var_dump($xpath->document === $dom); // bool(false)
gives false after load becouse of $dom already holds pointer to new libxml data but DOMXpath holds libxml context for $dom before load and pointer to real document after load.
Now about query works
If it should return XPATH_NODESET (as in your case) its make a node copy - node by node iterating throw detected node set(\ext\dom\xpath.c from 468 line). Copy but with original document node as parent. Its means that you can modify result but this gone away you XPath and DOMDocument connection.
XPath results provide a parentNode memeber that knows their origin:
for attribute values, parentNode returns the element that carries them. An example is //foo/#attribute, where the parent would be a foo Element.
for the text() function (as in //text()), it returns the element that contains the text or tail that was returned.
note that parentNode may not always return an element. For example, the XPath functions string() and concat() will construct strings that do not have an origin. For them, parentNode will return None.
So,
There is no any reasons to cache XPath. It do not anything besides xmlXPathNewContext (just allocate lightweight internal struct).
Each time your modify your DOMDocument (removeChild, replaceChild, etc.) your should recreate XPath.
We can not use something like normalizeDocument to "refresh DOM" because of it change internal document structure and invalidate xmlXPathNewContext created in Xpath constructor.
Only "new DOMXpath($doc);" is allways secure? Yes, if you do not change $doc between Xpath usage. Need to reload $doc also - no, because of it invalidated previously created xmlXPathNewContext.
(this is not a real answer, but a consolidation of comments and answers posted here and related questions)
This new version of the question's DOMXpath_reuser function contains the #ThomasWeinert suggestion (for avoid DOM changes by external re-load) and an option $enforceRefresh to workaround the problem of instability (as related question shows the programmer must detect when).
function DOMXpath_reuser_v2($file, $enforceRefresh=0) { //changed here
static $doc=NULL;
static $docName='';
static $xp=NULL;
if (!$doc)
$doc = new DOMDocument();
if ( $file!=$docName || ($xp && $doc !== $xp->document) ) { // changed here
$doc->load($file);
$xp = NULL;
} elseif ($enforceRefresh==2) { // add this new refresh mode
$doc->loadXML($doc->saveXML());
$xp = NULL;
}
if (!$xp || $enforceRefresh==1) //changed here
$xp = new DOMXpath($doc);
return $xp;
}
When must to use $enforceRefresh=1 ?
... perhaps an open problem, only little tips and clues...
when DOM submited to setAttribute, removeChild, replaceChild, etc.
...? more cases?
When must to use $enforceRefresh=2 ?
... perhaps an open problem, only little tips and clues...
when DOM was subject to indexes inconsistences, etc. See this question/solution.
...? more cases?

Categories