Get all siblings-nodes where node equals... with xpath and namespace - php

I am using xpath to find content from a epg-file, but for this source, my code simply won't work. And now i have come to the point that i cant solve this myself.
The XML looks like this (as you see, 2 namespaces, v3 and v31).
<?xml version="1.0" encoding="UTF-8"?>
<v3:schedule timestamp="2017-05-12T16:11:06.595Z" xmlns:v3="http://common.tv.se/schedule/v3_1">
<v3:from>2017-05-12T22:00:00.000Z</v3:from>
<v3:to>2017-05-13T22:00:00.000Z</v3:to>
...
<v3:contentList>
<v31:content timestamp="2017-05-12T16:11:06.595Z" xmlns:v31="http://common.tv.se/content/v3_1">
<v31:contentId>content.1375706-006</v31:contentId>
<v31:seriesId>series.40542</v31:seriesId>
<v31:seasonNumber>3</v31:seasonNumber>
<v31:episodeNumber>6</v31:episodeNumber>
<v31:numberOfEpisodes>8</v31:numberOfEpisodes>
<v31:productionYear>2017</v31:productionYear>
...
<v3:eventList>
<v31:event timestamp="2017-05-12T16:11:06.595Z" xmlns:v31="http://common.tv.se/event/v3_1">
<v31:eventId>event.26072881</v31:eventId>
<v31:channelId>channel.24</v31:channelId>
<v31:rerun>true</v31:rerun>
<v31:live>false</v31:live>
<v31:hidden>false</v31:hidden>
<v31:description/>
<v31:timeList>
<v31:time type="public">
<v31:startTime>2017-05-12T22:55:00.000Z</v31:startTime>
<v31:endTime>2017-05-12T23:55:00.000Z</v31:endTime>
<v31:duration>01:00:00:00</v31:duration>
</v31:time>
</v31:timeList>
<v31:contentIdRef>content.1375706-006</v31:contentIdRef>
<v31:materialIdRef>material.1010161108005267221</v31:materialIdRef>
<v31:previousEventList/>
<v31:comingEventList/>
</v31:event>
...
<v3:materialList>
<v31:material timestamp="2017-05-12T16:11:06.595Z" xmlns:v31="http://common.tv.se/material/v3_1">
<v31:materialId>material.1010161108005267221</v31:materialId>
<v31:contentIdRef>content.1375706-006</v31:contentIdRef>
<v31:materialType>tx</v31:materialType>
<v31:videoFormat>576i</v31:videoFormat>
<v31:audioList>
<v31:format language="unknown">stereo</v31:format>
</v31:audioList>
<v31:aspectRatio>16:9</v31:aspectRatio>
<v31:materialReferenceList>
</v31:materialReferenceList>
</v31:material>
...
And the "contentIdRef" is what keeps the different elements (event and material) together.
And i want to find all the data, based on contentIdRef.
I have used this (in php):
$parent = $this->xmldata->xpath('//v31:event/v31:contentIdRef[.="content.1375706-006"]/parent::*')
and i have also tried
$parent = $this->xmldata->xpath('//v31:event/v31:contentIdRef[.="content.1375706-006"]/parent::*/child::*');
But, the first alternative just (with print_r) returns v31:event "timestamp"
the second alternative returns 11 "simpleXMLobjects" that are empty ( why are they empty?? ), so based on the amount of objects, i think i have "hit the spot", but i can't find out why they are empty....
And yes, i have registered namespaces throughout my code ( i wish it was that simple ).
TLDR;
I want to 1. get all contentIds from first block (v3:contentList),
2. get all eventdata for each contentId,
3. get all materialdata for each content id...
I sincerely hope you can help :/

Did you register prefixes for the namespaces in the Xpath expressions? Always register your own prefixes for the namespaces you're using. PHP registers the namespace definition of the current context node by default. But this can change on any element node in the document and not all prefixes might be defined on the document element.
$schedule = new SimpleXMLElement($xml);
$schedule->registerXpathNamespace('s', 'http://common.tv.se/schedule/v3_1');
$schedule->registerXpathNamespace('e', 'http://common.tv.se/event/v3_1');
$events = $schedule->xpath(
'//e:event[e:contentIdRef = "content.1375706-006"]'
);
foreach ($events as $event) {
echo $event->asXml(), "\n\n";
}
Or with DOM:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('s', 'http://common.tv.se/schedule/v3_1');
$xpath->registerNamespace('e', 'http://common.tv.se/event/v3_1');
$events = $xpath->evaluate('//e:event');
foreach ($events as $event) {
echo $document->saveXml($event), "\n\n";
}

Related

Get children of an XML node with unknown structure

I'm trying to modify an XML document which contains some node that I can identify by name. For example, I might want to modify the <abc>some text</abc> node in a document (which I can identify by the tag name abc)
The problem I'm facing currently is that I don't know the exact structure of this document. I don't know what the root element is called and I don't know which children might contain this <abc> node.
I tried using SimpleXML<...> but this does not allow me to read arbitrary element children:
$xml = new SimpleXMLElement($xmlString);
foreach ($xml->children() as $child) {
// code here doesnt execute
}
I'm considering building my own XML parser which would have this simple functionality, but I cannot believe that simply iterating over all child nodes of a node (eventually recursively) is not something that is supported by PHP. Hopefully someone can tell me what I'm missing. Thanks!
Use DOMDocument
$dom = new DOMDocument();
#$dom->loadXML($xmlString);
foreach($dom->getElementsByTagName('item') as $item) {
if ($item->hasChildNodes()) {
foreach($item->childNodes as $i) {
YOUR CODE HERE
}
}
}
I found the solution moments after posting, after being stuck on it for a while..
SimpleXML<...> does not have these features, but the DOMDocument and associated classes do;
$dom = new DOMDocument();
$dom->loadXml($xmlString);
foreach($dom->childNodes as $child) {
if ($child->nodeName == "abc") {
$child->textContent = "modified text content";
}
}
Documentation for future reference, here: http://php.net/manual/en/book.dom.php
Thanks for your help.

Read XML File with DOMDocument in php

I want to read this xml document:
<?xml version="1.0" encoding="UTF-8"?>
<tns:getPDMNumber xmlns:tns="http://www.testgroup.com/TestPDM" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.testgroup.com/TestPDM getPDMNumber.xsd ">
<tns:getPDMNumberResponse>
<tns:requestID>22222</tns:requestID>
<tns:pdmNumber>654321</tns:pdmNumber>
<tns:responseCode>0</tns:responseCode>
</tns:getPDMNumberResponse>
</tns:getPDMNumber>
I tried it this way:
$dom->load('response/17_getPDMNumberResponse.xml');
$nodes = $dom->getElementsByTagName("tns:requestID");
//$nodes = $dom->getElementsByTagName("tns:getPDMNumber");
//$nodes = $dom->getElementsByTagName("tns:getPDMNumberResponse");
foreach($nodes as $node)
{
$response=$node->getElementsByTagName("tns:getPDMNumber");
foreach($response as $info)
{
$test = $info->getElementsByTagName("tns:pdmNumber");
$pdm = $test->nodeValue;
}
}
the code never runs into the foreach loop.
Only for clarification my goal is to read the "tns:pdmNumber" node.
Have anybody a idea?
EDIT: I have also tried the commited lines.
The XML uses a namespace, so you should use the namespace aware methods. They have the suffix _NS.
$tns = 'http://www.testgroup.com/TestPDM';
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagNameNS($tns, "pdmNumber") as $node) {
var_dump($node->textContent);
}
Output:
string(6) "654321"
A better option is to use Xpath expression. They allow a more comfortable access to DOM nodes. In this case you have to register a prefix for the namespace that you can use in the Xpath expression:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('t', 'http://www.testgroup.com/TestPDM');
var_dump(
$xpath->evaluate('string(/t:getPDMNumber/t:getPDMNumberResponse/t:pdmNumber)')
);
This:
$nodes = $dom->getElementsByTagName("tns:requestID");
you find all the requestID nodes, and try to loop on them. That's fine, but then you use that node as a basis to find any getPDMNumber nodes UNDER the requestID - but there's nothing - requestID is a terminal node. So
$response=$node->getElementsByTagName("tns:getPDMNumber");
finds nothing, and the inner loop has nothing to do.
It's like saying "Start digging a hole until you reach china. Once you reach China, keep digging until you reach Australia". But you can't keep digging - you've reached the "bottom", and the only thing deeper than China would be going into orbit.

Xpath in PHP with OTA standards

I have basic knowledge about the use of Xpath in PHP, but I'm having some troubles with a specific case and I think that the problem is in the standards.
This is the snippet of the XML and it's based on the OTA standards:
<SendHotelResResult xmlns:a="http://schemas/Models/OTA" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:RoomRates>
<a:RoomRate>
<a:EffectiveDate>2015-11-13T00:00:00</a:EffectiveDate>
<a:ExpireDate>2015-11-15T00:00:00</a:ExpireDate>
<a:RatePlanID>25</a:RatePlanID>
<a:RatesType>
<a:Rates>
<a:Rate>
<a:AgeQualifyingCode i:nil="true"/>
<a:EffectiveDate>2015-11-13T00:00:00</a:EffectiveDate>
<a:Total>
<a:AmountAfterTax>0</a:AmountAfterTax>
<a:AmountBeforeTax>260.00</a:AmountBeforeTax>
<a:CurrencyCode>EUR</a:CurrencyCode>
</a:Total>
</a:Rate>
<a:Rate>
<a:AgeQualifyingCode i:nil="true"/>
<a:EffectiveDate>2015-11-14T00:00:00</a:EffectiveDate>
<a:Total>
<a:AmountAfterTax>0</a:AmountAfterTax>
<a:AmountBeforeTax>260.00</a:AmountBeforeTax>
<a:CurrencyCode>EUR</a:CurrencyCode>
</a:Total>
</a:Rate>
</a:Rates>
</a:RatesType>
<a:RoomID>52</a:RoomID>
<a:Total>
<a:AmountAfterTax>546.00</a:AmountAfterTax>
<a:AmountBeforeTax>520.00</a:AmountBeforeTax>
<a:CurrencyCode>EUR</a:CurrencyCode>
</a:Total>
</a:RoomRate>
</a:RoomRates>
</SendHotelRes>
What I want:
Get a specific <RoomRate> tag based on the element <RoomID>.
Get the global RoomRate <Total> tag. I don't want the <Total> tag that is inside the <Rate> tag. This is the reason why I'm using the xpath rather than a simple getElementsByTagName('Total'). I don't know if the OTA standards has some approach to differentiate the Total tags.
My attempts until now:
$dom = new DOMDocument();
$response = $dom->load($xmlSendHotelRes);
$roomID = '52';
$roomRatesTag = $response->getElementsByTagName('RoomRates')->item(0);
$prefix = $roomRatesTag->prefix;
$namespace = $roomRatesTag->lookupNamespaceURI($prefix);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace($prefix, $namespace);
$roomRateTotal = $xpath->query("//RoomRate[RoomID=$roomID]/Total", $roomRatesTag, true);
I already tried with and without $roomRatesTag as context and also other expressions like:
./RoomRate[RoomID=$roomID]/Total, //RoomRate[RoomID=$roomID]/Total, //RoomRate/[RoomID=$roomID]/Total,//RoomRate[RoomID=$roomID]/Total and //RoomRate/RoomID[text() = $roomID]/../Total but any of them works.
Actually, even $roomRate = $xpath->query("//RoomRate"); returns a empty DOMNodeList, so, I don't know what I doing wrong and I'm thinking about the problem in the standards with 2 identical tags in different places, although this not make much sense.
Are there some other expressions that I need to try?
You're fetching the namespace from the document.
$prefix = $roomRatesTag->prefix;
$namespace = $roomRatesTag->lookupNamespaceURI($prefix);
But this is not necessary or a good idea. You know that the document uses OTA, so you know the namespace is http://schemas/Models/OTA.
The prefix is just an alias for the actual namespace value the following 3 XML example all resolve to a node {http://schemas/Models/OTA}RoomRates
<a:RoomRates xmlns:a="http://schemas/Models/OTA"/>
<ota:RoomRates xmlns:ota="http://schemas/Models/OTA"/>
<RoomRates xmlns="http://schemas/Models/OTA"/>
Your Api has to look for nodes inside the namespace.
One possibility is to use the *NS (namespace aware) methods.
$response->getElementsByTagNameNS('http://schemas/Models/OTA', 'RoomRates')->item(0);
The other is to use Xpath and register prefixes for the namespaces. This can be the prefixes from the document, or different ones.
$document = new DOMDocument();
$document->load($xmlSendHotelRes);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('ota', 'http://schemas/Models/OTA');
var_dump(
$xpath->evaluate(
'string(//ota:RoomRates/ota:RoomRate[ota:RoomID=$roomID]/ota:Total)')
)
);
For a location path, DOMXpath::evaluate() would return a DOMNodeList but with string() it casts the first found node into a string and returns it.
You need to use a prefix (that you registered) and I think you want to start your path with .// and not with // if you want to search relative to the context node, so try ".//a:RoomRate[a:RoomID=$roomID]/a:Total"

Using DOMXml and Xpath, to update XML entries

Hello I know there is many questions here about those three topics combined together to update XML entries, but it seems everyone is very specific to a given problem.
I have been spending some time trying to understand XPath and its way, but I still can't get what I need to do.
Here we go
I have this XML file
<?xml version="1.0" encoding="UTF-8"?>
<storagehouse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema.xsd">
<item id="c7278e33ef0f4aff88da10dfeeaaae7a">
<name>HDMI Cable 3m</name>
<weight>0.5</weight>
<category>Cables</category>
<location>B3</location>
</item>
<item id="df799fb47bc1e13f3e1c8b04ebd16a96">
<name>Dell U2410</name>
<weight>2.5</weight>
<category>Monitors</category>
<location>C2</location>
</item>
</storagehouse>
What I would like to do is to update/edit any of the nodes above when I need to. I will do a Html form for that.
But my biggest conserne is how do I find and update a the desired node and update it?
Here I have some of what I am trying to do
<?php
function fnDOMEditElementCond()
{
$dom = new DOMDocument();
$dom->load('storage.xml');
$library = $dom->documentElement;
$xpath = new DOMXPath($dom);
// I kind of understand this one here
$result = $xpath->query('/storagehouse/item[1]/name');
//This one not so much
$result->item(0)->nodeValue .= ' Series';
// This will remove the CDATA property of the element.
//To retain it, delete this element (see delete eg) & recreate it with CDATA (see create xml eg).
//2nd Way
//$result = $xpath->query('/library/book[author="J.R.R.Tolkein"]');
// $result->item(0)->getElementsByTagName('title')->item(0)->nodeValue .= ' Series';
header("Content-type: text/xml");
echo $dom->saveXML();
}
?>
Could someone maybe give me an examples with attributes and so on, so one a user decides to update a desired node, I could find that node with XPath and then update it?
The following example is making use of simplexml which is a close friend of DOMDocument. The xpath shown is the same regardless which method you use, and I use simplexml here to keep the code low. I'll show a more advanced DOMDocument example later on.
So about the xpath: How to find the node and update it. First of all how to find the node:
The node has the element/tagname item. You are looking for it inside the storagehouse element, which is the root element of your XML document. All item elements in your document are expressed like this in xpath:
/storagehouse/item
From the root, first storagehouse, then item. Divided with /. You already know that, so the interesting part is how to only take those item elements that have the specific ID. For that the predicate is used and added at the end:
/storagehouse/item[#id="id"]
This will return all item elements again, but this time only those which have the attribute id with the value id (string). For example in your case with the following XML:
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<storagehouse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema.xsd">
<item id="c7278e33ef0f4aff88da10dfeeaaae7a">
<name>HDMI Cable 3m</name>
<weight>0.5</weight>
<category>Cables</category>
<location>B3</location>
</item>
<item id="df799fb47bc1e13f3e1c8b04ebd16a96">
<name>Dell U2410</name>
<weight>2.5</weight>
<category>Monitors</category>
<location>C2</location>
</item>
</storagehouse>
XML;
that xpath:
/storagehouse/item[#id="df799fb47bc1e13f3e1c8b04ebd16a96"]
will return the computer monitor (because such an item with that id exists). If there would be multiple items with the same id value, multiple would be returned. If there were none, none would be returned. So let's wrap that into a code-example:
$simplexml = simplexml_load_string($xml);
$result = $simplexml->xpath(sprintf('/storagehouse/item[#id="%s"]', $id));
if (!$result || count($result) !== 1) {
throw new Exception(sprintf('Item with id "%s" does not exists or is not unique.', $id));
}
list($item) = $result;
In this example, $titem is the SimpleXMLElement object of that computer monitor xml element name item.
So now for the changes, which are extremely easy with SimpleXML in your case:
$item->category = 'LCD Monitor';
And to finally see the result:
echo $simplexml->asXML();
Yes that's all with SimpleXML in your case.
If you want to do this with DOMDocument, it works quite similar. However, for updating an element's value, you need to access the child element of that item as well. Let's see the following example which first of all fetches the item as well. If you compare with the SimpleXML example above, you can see that things not really differ:
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$result = $xpath->query(sprintf('/storagehouse/item[#id="%s"]', $id));
if (!$result || $result->length !== 1) {
throw new Exception(sprintf('Item with id "%s" does not exists or is not unique.', $id));
}
$item = $result->item(0);
Again, $item contains the item XML element of the computer monitor. But this time as a DOMElement. To modify the category element in there (or more precisely it's nodeValue), that children needs to be obtained first. You can do this again with xpath, but this time with an expression relative to the $item element:
./category
Assuming that there always is a category child-element in the item element, this could be written as such:
$category = $xpath->query('./category', $item)->item(0);
$category does now contain the first category child element of $item. What's left is updating the value of it:
$category->nodeValue = "LCD Monitor";
And to finally see the result:
echo $doc->saveXML();
And that's it. Whether you choose SimpleXML or DOMDocument, that depends on your needs. You can even switch between both. You probably might want to map and check for changes:
$repository = new Repository($xml);
$item = $repository->getItemByID($id);
$item->category = 'LCD Monitor';
$repository->saveChanges();
echo $repository->getXML();
Naturally this requires more code, which is too much for this answer.

PHP Dealing with missing XML data

If I have three sets of data, say:
<note><from>Me</from><to>someone</to><message>hello</message></note>
<note><from>Me</from><to></to><message>Need milk & eggs</message></note>
<note><from>Me</from><message>Need milk & eggs</message></note>
and I'm using simplexml is there a way to have simple xml check that there's an empty/absent tag automatically?
I would like the output to be:
FROM TO MESSAGE
Me someone hello
Me NULL Need milk & eggs
Me NULL Need milk & eggs
Right now I'm doing it manually and I quickly realised that it's going to take a very long time to do it for long xml files.
My current sample code:
$xml = simplexml_load_string($string);
if ($xml->from != "") {$out .= $xml->from."\t"} else {$out .= "NULL\t";}
//repeat for all children, checking by name
Sometimes the order is different as well, there might be a xml with:
<note><message>pick up cd</message><from>me</from></note>
so iterating through the children and checking by index count doesn't work.
The actual xml files I'm working with are thousands of lines each, so I obviously can't just code in every tag.
It sounds like you need a DTD (Document Type Definition), which will define the required format of the XML file, and specify which elements are required, optional, what they can contain, etc.
DTDs can be used to validate an XML file before you do any processing with it.
Unfortunately, PHP's simplexml library doesn't do anything with DTD, but the DomDocument library does, so you may want to use that instead.
I'll leave it as a separate excersise for you to research how to create a DTD file. If you need more help with that, I'd suggest asking it as a separate question.
You could use the DOMDocument instead. I have created a quick demo that splits the <note> elements into an array using the XML tag names as keys. You could then iterate the resultant array to create your output.
I corrected the invalid XML by replacing the ampersand with the HTML entity equivalent (&).
<?php
libxml_use_internal_errors(true);
$xml = <<<XML
<notes>
<note><from>Me</from><to>someone</to><message>hello</message></note>
<note><from>Me</from><to></to><message>Need milk & eggs</message></note>
<note><from>Me</from><message>Need milk & eggs</message></note>
<note><message>pick up cd</message><from>me</from></note>
</notes>
XML;
function getNotes($nodelist) {
$notes = array();
foreach ($nodelist as $node) {
$noteParts = array();
foreach ($node->childNodes as $child) {
$noteParts[$child->tagName] = $child->nodeValue;
}
$notes[] = $noteParts;
}
return $notes;
}
$dom = new DOMDocument();
$dom->recover = true;
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
$nodelist = $xpath->query("//note");
$notes = getNotes($nodelist);
print_r($notes);
?>
Edit: If you change to $noteParts = array(); to $noteParts = array('from' => null, 'to' => null, 'message' => null); then it will always create the full set of keys.

Categories