Using DOMXml and Xpath, to update XML entries - php

Hello I know there is many questions here about those three topics combined together to update XML entries, but it seems everyone is very specific to a given problem.
I have been spending some time trying to understand XPath and its way, but I still can't get what I need to do.
Here we go
I have this XML file
<?xml version="1.0" encoding="UTF-8"?>
<storagehouse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema.xsd">
<item id="c7278e33ef0f4aff88da10dfeeaaae7a">
<name>HDMI Cable 3m</name>
<weight>0.5</weight>
<category>Cables</category>
<location>B3</location>
</item>
<item id="df799fb47bc1e13f3e1c8b04ebd16a96">
<name>Dell U2410</name>
<weight>2.5</weight>
<category>Monitors</category>
<location>C2</location>
</item>
</storagehouse>
What I would like to do is to update/edit any of the nodes above when I need to. I will do a Html form for that.
But my biggest conserne is how do I find and update a the desired node and update it?
Here I have some of what I am trying to do
<?php
function fnDOMEditElementCond()
{
$dom = new DOMDocument();
$dom->load('storage.xml');
$library = $dom->documentElement;
$xpath = new DOMXPath($dom);
// I kind of understand this one here
$result = $xpath->query('/storagehouse/item[1]/name');
//This one not so much
$result->item(0)->nodeValue .= ' Series';
// This will remove the CDATA property of the element.
//To retain it, delete this element (see delete eg) & recreate it with CDATA (see create xml eg).
//2nd Way
//$result = $xpath->query('/library/book[author="J.R.R.Tolkein"]');
// $result->item(0)->getElementsByTagName('title')->item(0)->nodeValue .= ' Series';
header("Content-type: text/xml");
echo $dom->saveXML();
}
?>
Could someone maybe give me an examples with attributes and so on, so one a user decides to update a desired node, I could find that node with XPath and then update it?

The following example is making use of simplexml which is a close friend of DOMDocument. The xpath shown is the same regardless which method you use, and I use simplexml here to keep the code low. I'll show a more advanced DOMDocument example later on.
So about the xpath: How to find the node and update it. First of all how to find the node:
The node has the element/tagname item. You are looking for it inside the storagehouse element, which is the root element of your XML document. All item elements in your document are expressed like this in xpath:
/storagehouse/item
From the root, first storagehouse, then item. Divided with /. You already know that, so the interesting part is how to only take those item elements that have the specific ID. For that the predicate is used and added at the end:
/storagehouse/item[#id="id"]
This will return all item elements again, but this time only those which have the attribute id with the value id (string). For example in your case with the following XML:
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<storagehouse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema.xsd">
<item id="c7278e33ef0f4aff88da10dfeeaaae7a">
<name>HDMI Cable 3m</name>
<weight>0.5</weight>
<category>Cables</category>
<location>B3</location>
</item>
<item id="df799fb47bc1e13f3e1c8b04ebd16a96">
<name>Dell U2410</name>
<weight>2.5</weight>
<category>Monitors</category>
<location>C2</location>
</item>
</storagehouse>
XML;
that xpath:
/storagehouse/item[#id="df799fb47bc1e13f3e1c8b04ebd16a96"]
will return the computer monitor (because such an item with that id exists). If there would be multiple items with the same id value, multiple would be returned. If there were none, none would be returned. So let's wrap that into a code-example:
$simplexml = simplexml_load_string($xml);
$result = $simplexml->xpath(sprintf('/storagehouse/item[#id="%s"]', $id));
if (!$result || count($result) !== 1) {
throw new Exception(sprintf('Item with id "%s" does not exists or is not unique.', $id));
}
list($item) = $result;
In this example, $titem is the SimpleXMLElement object of that computer monitor xml element name item.
So now for the changes, which are extremely easy with SimpleXML in your case:
$item->category = 'LCD Monitor';
And to finally see the result:
echo $simplexml->asXML();
Yes that's all with SimpleXML in your case.
If you want to do this with DOMDocument, it works quite similar. However, for updating an element's value, you need to access the child element of that item as well. Let's see the following example which first of all fetches the item as well. If you compare with the SimpleXML example above, you can see that things not really differ:
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$result = $xpath->query(sprintf('/storagehouse/item[#id="%s"]', $id));
if (!$result || $result->length !== 1) {
throw new Exception(sprintf('Item with id "%s" does not exists or is not unique.', $id));
}
$item = $result->item(0);
Again, $item contains the item XML element of the computer monitor. But this time as a DOMElement. To modify the category element in there (or more precisely it's nodeValue), that children needs to be obtained first. You can do this again with xpath, but this time with an expression relative to the $item element:
./category
Assuming that there always is a category child-element in the item element, this could be written as such:
$category = $xpath->query('./category', $item)->item(0);
$category does now contain the first category child element of $item. What's left is updating the value of it:
$category->nodeValue = "LCD Monitor";
And to finally see the result:
echo $doc->saveXML();
And that's it. Whether you choose SimpleXML or DOMDocument, that depends on your needs. You can even switch between both. You probably might want to map and check for changes:
$repository = new Repository($xml);
$item = $repository->getItemByID($id);
$item->category = 'LCD Monitor';
$repository->saveChanges();
echo $repository->getXML();
Naturally this requires more code, which is too much for this answer.

Related

Modify XML in PHP

I have the xml below
<?xml version="1.0" encoding="UTF-8"?>
<!--Sample XML file generated by XMLSpy v2013 (http://www.altova.com)-->
<ftc:FATCA_OECD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ftc="urn:oecd:ties:fatca:v2" xmlns:sfa="urn:oecd:ties:stffatcatypes:v2" version="2.0" xsi:schemaLocation="urn:oecd:ties:fatca:v2 FatcaXML_v2.0.xsd">
<ftc:MessageSpec>
<sfa:SendingCompanyIN>S519K4.99999.SL.392</sfa:SendingCompanyIN>
<sfa:TransmittingCountry>JP</sfa:TransmittingCountry>
<sfa:ReceivingCountry>US</sfa:ReceivingCountry>
<sfa:MessageType>FATCA</sfa:MessageType>
<sfa:MessageRefId>DBA6455E-8454-47D9-914B-FEE48E4EF3AA</sfa:MessageRefId>
<sfa:ReportingPeriod>2016-12-31</sfa:ReportingPeriod>
<sfa:Timestamp>2017-01-17T09:30:47Z</sfa:Timestamp>
<ftc:SendingCompanyIN>testing</ftc:SendingCompanyIN></ftc:MessageSpec>
<ftc:FATCA>
<ftc:ReportingFI>
<sfa:ResCountryCode>JP</sfa:ResCountryCode>
<sfa:TIN>S519K4.99999.SL.392</sfa:TIN>
<sfa:Name>Bank of NN</sfa:Name>
<sfa:Address>
<sfa:CountryCode>JP</sfa:CountryCode>
<sfa:AddressFree>123 Main Street</sfa:AddressFree>
</sfa:Address>
<ftc:DocSpec>
<ftc:DocTypeIndic>FATCA1</ftc:DocTypeIndic>
<ftc:DocRefId>S519K4.99999.SL.392.50B80D2D-79DA-4AFD-8148-F06480FFDEB5</ftc:DocRefId>
</ftc:DocSpec>
</ftc:ReportingFI>
<ftc:ReportingGroup>
<ftc:NilReport>
<ftc:DocSpec>
<ftc:DocTypeIndic>FATCA1</ftc:DocTypeIndic>
<ftc:DocRefId>S519K4.99999.SL.392.CE54CA78-7C31-4EC2-B73C-E387C314F426</ftc:DocRefId>
</ftc:DocSpec>
<ftc:NoAccountToReport>yes</ftc:NoAccountToReport>
</ftc:NilReport>
</ftc:ReportingGroup>
</ftc:FATCA>
</ftc:FATCA_OECD>
I want to change node value, sfa:TIN and save the xml in a new file. How can this be accomplished in PHP? I got examples but none used namespaces.
One way you could do this is using DOMDocument and DOMXPath and find your elements using for example an xpath expression which will find the 'TIN' elements in the sfa namespace.
/ftc:FATCA_OECD/ftc:FATCA/ftc:ReportingFI/sfa:TIN
To update the value of the first found elemement you could take the first item from the DOMNodeList which is returned by query.
$doc = new DOMDocument();
$doc->loadXML($data);
$xpath = new DOMXPath($doc);
$res = $xpath->query("/ftc:FATCA_OECD/ftc:FATCA/ftc:ReportingFI/sfa:TIN");
if ($res->length > 0) {
$res[0]->nodeValue = "test";
}
$doc->save("yourfilename.xml");
Demo
You can use the following solution, using DOMDocument::getElementsByTagNameNS:
<?php
$dom = new DOMDocument();
$dom->load('old-file.xml');
//get all TIN nodes.
$nodesTIN = $dom->getElementsByTagNameNS('urn:oecd:ties:stffatcatypes:v2', 'TIN');
//check for existing TIN node.
if (count($nodesTIN) === 1) {
//update the first TIN node.
$nodesTIN->item(0)->nodeValue = 'NEWVALUE_OF_TIN';
}
//save the file to a new one.
$dom->save('new-file.xml');

Reading and writing Associative array to XML file

I am trying to add an array $item to an XML file in order to then be able to read all of the items in a later time.
I have the following PHP to perform this action:
<?php
$item = array();
$item['rating'] = $_GET['rating'];
$item['comment'] = $_GET['comment'];
$item['item_id'] = $_GET['item_id'];
$item['status'] = "pending";
//Defining $xml
$xml = new SimpleXMLElement('<root/>');
array_walk_recursive($item, array($xml, 'addChild'));
$xml = $xml->asXML();
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($xml);
//Save XML as a file
$dom->save('reviews.xml');
However, when run what I get this in my XML file:
< ?xml version="1.0"?>
Basically my array is no where to be seen.
A var_dump of $item gives
array(4) { ["rating"]=> string(1) "8" ["comment"]=> string(17) "I Really Like it!" ["item_id"]=> string(1) "9" ["status"]=> string(7) "pending" }
How could I modify my code in order to have it save an array (and if there are many keep them all) in the file reviews.xml?
Also How could I make it so that later on I would be able to access the data; for instance changing the status from pending to approved?
EDIT:
Using the following code I have been able to save my item to the file:
$item = array();
$item[$_GET['rating']] = 'rating';
$item[$_GET['comment']] = 'comment';
$item[$_GET['item_id']] = 'item_id';
$item['pending'] = 'status';
$xml = new SimpleXMLElement('<root/>');
array_walk_recursive($item, array($xml, 'addChild'));
$xml->asXML('reviews.xml');
However I am still unable to append new data to the root rather than overwriting the current saved data.
As I was saying in my comment... The code you provided errors with WARNING DOMDocument::loadXML(): Empty string supplied as input. You never assigned anything to $xml'...
Proper error reporting/logging would help spot these mistakes.
<?php
$item = array();
$item['rating'] = 'a';
$item['comment'] = 'b';
$item['item_id'] = 'c';
$item['status'] = "pending";
//Defining $xml
$xml = new SimpleXMLElement('<root/>');
array_walk_recursive($item, array($xml, 'addChild'));
//THIS IS THE LINE YOU WERE MISSING
$xml = $xml->asXML();
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($xml);
//Save XML as a file
$dom->save('reviews.xml');
If you echoed it out...
var_dump($dom->saveHTML());
> string(80)
> "<root><a>rating</a><b>comment</b><c>item_id</c><pending>status</pending></root>"
Please avoid updating your existing question with additional questions.
A database would make the task easier. Using a flat file works fine though, XML, or some other format. You will need to be able to retrieve a record by item_id, at which point you modify it, then replace it. That is the gist of it.
So here's an overhaul of your code, with some changes to both your approach and the scheme of your XML, based on your various comments and updates.
So first, instead of creating XML that looks like this:
<root>
<rating>a</rating>
<comment>b</comment>
<item_id>c</item_id>
<status>pending</status>
</root>
You're going to store the XML like this:
<root>
<item id="c">
<rating>a</rating>
<comment>b</comment>
<status>pending</status>
</item>
</root>
This is based on a few of your comments:
You are wanting to add to the XML file rather than overwrite the existing file content. That suggests that you want to store multiple items. This would also explain why you have a property item_id. So rather than having a mess of XML like :
<root>
<rating>a</rating>
<comment>b</comment>
<item_id>c</item_id>
<status>pending</status>
<rating>d</rating>
<comment>e</comment>
<item_id>f</item_id>
<status>pending</status>
<rating>g</rating>
<comment>h</comment>
<item_id>i</item_id>
<status>pending</status>
</root>
where it is impossible to know which item is which, you store each set of item properties on an <item> element. Since you are going to want to easily grab an item based on its item_id in order to update that item, making item_id an attribute of the <item> makes more sense than making it a child of the <item>.
You want to be able to update the status. This is where having the item_id stored on the item comes in handy. If someone submits a request with an existing item_id, you can update that item, including its status element. Or you could do it whenever you need to from some other process, etc.
Here's the code I drummed up for this. Note that it currently isn't set up to look for an existing element with that item id, but that should be possible using existing SimpleXML functions/methods.
$item = array();
$item_id = "c";
$item['rating'] = 'a';
$item['comment'] = 'b';
$item['status'] = "pending";
$xml = simplexml_load_file('ratings.xml');
//if ratings.xml not found or not valid xml, create clean XML with <root/>
if($xml === false) {
$xml = new SimpleXMLElement('<root/>');
}
$xml_item = $xml->addChild("item");
$xml_item->addAttribute("id", $item_id);
foreach($item as $name => $value) {
$xml_item->addChild($name, $value);
}
$xml->asXML('ratings.xml');
Notice that one of the major changes I made to your existing code is changing from using array_walk_recursive to a simple foreach. array_walk_recursive for this purpose is a short cut that causes more issues than it solves. For instance, you had to swap your key and value on the $item array, which is confusing. It also isn't necessary for what you currently are doing, since you don't have a multi-dimensional array. And even if you did, array_walk_recursive isn't the right choice to handle looping over the array recursively because it would add each array member to the root of the XML, not add sub-arrays as children of their parent entry as they show up in the actual array. Point being, it's confusing, it doesn't add any value, and using a foreach is a lot more clear on what you are actually doing.
I've also changed
$item['item_id'] = 'c';
to
$item_id = 'c';
and then added it to the item element as an attribute like:
$xml_item->addAttribute("id", $item_id);
This is consistent with the new schema I outlined earlier.
Finally, instead of passing the XML to DOMDocument, I'm just using
$xml->asXML('ratings.xml');
SimpleXML already removes any extra whitespace, so there is no need to use DOMDocument to achieve this.
Based on some of the counterintuitive parts of your original code, it looks like you may have done a decent amount of copy and pasting to get it going. Which is where most of us start, but it's a good idea to be upfront about things like "I don't understand quite what this code is doing, I just grabbed it from a script that did some of what I need." It will save us all a lot of time and grief if we're not assuming you are using the code you have because you need to or it was a conscious decision, etc, and that we have to work within the constraints of that code.
I hope this gets you off to a good start.
Update
I was messing around with it, and came up with the following for updating existing <item> if an item with id set to $item_id already exists. It's a bit clunky, but it tested and it works.
This assumes the $item_id and $item array get set as normal, as well as retrieving the exiting XML, as covered above. I'm providing the lines just before the changes for reference:
$xml = simplexml_load_file('ratings.xml');
//if ratings.xml not found or not valid xml, create clean XML with <root/>
if($xml === false) {
$xml = new SimpleXMLElement('<root/>');
}
//query with xpath for existing item with $item_id
$item_with_id = $xml->xpath("/root/item[#id='{$item_id}']");
// if the xpath returns a result, update that item with new values.
if(count($item_with_id) > 0) {
$xml_item = $item_with_id[0];
foreach($item as $name => $value) {
$xml_item->$name = $value;
}
} else {
// if the xpath returns no results, create new item element.
$xml_item = $xml->addChild("item");
$xml_item->addAttribute("id", $item_id);
foreach($item as $name => $value) {
$xml_item->addChild($name, $value);
}
}

Get all children from certain xml child element using SimpleXMLElement and xpath

I have xml like:
<root xmlns="urn:test:apis:baseComponents">
<books>
<book>
<name>50 shades of grey</name>
</book>
</books>
<disks>
<disk>
<name>Britney Spears</name>
</disk>
</disks>
</root>
And such php code:
$xml = new SimpleXMLElement($xml);
$books = $xml->books;
$disks = $xml->disks;
$disks->registerXPathNamespace('x', 'urn:test:apis:baseComponents');
$books->registerXPathNamespace('x', 'urn:test:apis:baseComponents');
$b_names = $books->xpath('//x:name');
b_names contains array with 2 values instead of 1. First holds books->book->name, second holds disks->disk->name.
Can you please explain what am I doing wrong and how could I find children of only one element?
The reason that I am using xpath instead of taking manually values using SimpleXMLElement, is that I don't know what value, which I want to search in advance.
Use $books->xpath('.//x:name') to search descendants of your $books variable and not descendants of the root node/document node (which the path //x:name does).

Delete attribute in XML

I want to completely remove the size="id" attribute from every <door> element.
<?xml version="1.0" encoding="UTF-8"?>
<doors>
<door id="1" entry="3249" size="30"/>
<door id="1041" entry="6523" size="3094"/>
-- and 1000 more....
</doors>
The PHP code:
$xml = new SimpleXMLElement('http://mysite/doors.xml', NULL, TRUE);
$ids_to_delete = array( 1, 1506 );
foreach ($ids_to_delete as $id) {
$result = $xml->xpath( "//door[#size='$id']" );
foreach ( $result as $node ) {
$dom = dom_import_simplexml($node);
$dom->parentNode->removeChild($dom);
}
}
$xml->saveXml();
I get no errors but it does not delete the size attribute. Why?
I get no errors but it does not delete the size attribute. Why?
There are mulitple reasons why it does not delete the size attribute. The one that popped first into my mind was that attributes are no child nodes. Using a method to remove a child does just not fit to remove an attribute.
Each element node has an associated set of attribute nodes; the element is the parent of each of these attribute nodes; however, an attribute node is not a child of its parent element.
From: Attribute Nodes - XML Path Language (XPath), bold by me.
However, you don't see an error here, because the $result you have is an empty array. You just don't select any nodes to remove - neither elements nor attributes - with your xpath. That is because there is no such element you look for:
//door[#size='1']
You're searching for the id in the size attribute: No match.
These are the reasons why you get no errors and it does not delete any size attribute: 1.) you don't delete attributes here, 2.) you don't query any elements to delete attributes from.
How to delete attributes in SimpleXML queried by Xpath?
You can remove the attribute nodes by selecting them with an Xpath query and then unset the SimpleXMLElement self-reference:
// all size attributes of all doors
$result = $xml->xpath("//door/#size");
foreach ($result as $node) {
unset($node[0]);
}
In this example, all attribute nodes are queried by the Xpath expressions that are size attributes of door elements (which is what you ask for in your question) and then those are removed from the XML.
//door/#size
(see Abbreviated Syntax)
Now here the full example:
<?php
/**
* #link https://eval.in/215817
*/
$buffer = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<doors>
<door id="1" entry="3249" size="30"/>
<door id="1041" entry="6523" size="3094"/>
-- and 1000 more....
</doors>
XML;
$xml = new SimpleXMLElement($buffer);
// all size attributes of all doors
$result = $xml->xpath("//door/#size");
foreach ($result as $node) {
unset($node[0]);
}
$xml->saveXml("php://output");
Output (Online Demo):
<?xml version="1.0" encoding="UTF-8"?>
<doors>
<door id="1" entry="3249"/>
<door id="1041" entry="6523"/>
-- and 1000 more....
</doors>
You can do your whole query in DOMDocument using DOMXPath, rather than switching between SimpleXML and DOM:
$dom = new DOMDocument;
$dom->load('my_xml_file.xml');
# initialise an XPath object to act on the $dom object
$xp = new DOMXPath( $dom );
# run the query
foreach ($xp->query( "//door[#size]" ) as $door) {
# remove the attribute
$door->removeAttribute('size');
}
print $dom->saveXML();
Output for the input you supplied:
<?xml version="1.0" encoding="UTF-8"?>
<doors>
<door id="1" entry="3249"/>
<door id="1041" entry="6523"/>
</doors>
If you do want only to remove the size attribute for the IDs in your list, you should use the code:
foreach ($ids_to_delete as $id) {
# searches for elements with a matching ID and a size attribute
foreach ($xp->query("//door[#id='$id' and #size]") as $door) {
$door->removeAttribute('size');
}
}
Your code wasn't working for several reasons:
it looks like your XPath was wrong, since your array is called $ids_to_delete and your XPATH is looking for door elements with the size attribute equal to the value from $ids_to_delete;
you're converting the nodes to DOMDocument objects ($dom = dom_import_simplexml($node);) to do the deletion, but $xml->saveXml();, which I presume you printed somehow, is a SimpleXML object;
you need to remove the element attribute; removeChild removes the whole element.

XPath multidimensional arrays in PHP

I'm scraping a website that's mostly table based. I have <tr> tags that each represent a category and <td> tags inside these that represent properties of the category.
Using Xpath I get the <tr> fine but with all the <td> info inside it bunched as one string:
$html_string = file_get_contents('testpage.html');
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$xpath = new DOMXpath($dom);
$context_nodes = $xpath->query('//table[#id="category"]/tr[not(starts-with(#id, "category"))]');
And can each get <td> fine but with no retrospective reference to the category with:
$context_nodes = $xpath->query('//table[#id="category"]/tr[not(starts-with(#id, "category"))]/td');
What I would like to do later is be able to reference the properties of each category. I presumed I could do so with $context_nodes[2] etc., thinking that the array it created was a multidimensional string array. This doesn't seem to be the case.
How would I go about creating an array from the xpath info where I can grab a property of a category based on identifying what category I specifically want. E.g. train[1][2]?
Your second attempt is on the right lines. PHP (or, rather, libxml) retains a reference to the context the nodes you selected were returned from, allowing you to do precisely what you need in your case.
XML
<root>
<cat name="category 1">
<prop>prop 1.1</prop>
<prop>prop 1.2</prop>
</cat>
<cat name="category 2">
<prop>prop 2.1</prop>
<prop>prop 2.2</prop>
</cat>
</root>
PHP
$xml = new SimpleXMLElement($xml);
$props = $xml->xpath('cat/prop');
foreach($props as $prop) {
//let's go back up...
$parent_cat = $prop->xpath('parent::*/#name');
echo '<p>'.$prop.' (property of '.$parent_cat[0].')</p>';
}
Notice how we navigate back up the tree, from the point of the prop node, to reference the parent category. Not sure if this is what you meant but hope it helps.

Categories