Find if an element exists in XML using PHP DOM

Find if an element exists in XML using PHP DOM - php

I have a dom element of which i want to find if exists a specific sub-element.
My node is like this:
<properties>
<property name="a-random-neme" option="option2" >value1</property>
<property name="another-random-name">V2</property>
<property name="yet-another-random-name" option="option5" >K3</property>
</properties>
in php it is referenced by a dom object
$properties_node;
In another part of php code I want to check if a datum I'm going to add already exists
$datum = [ 'name'=>'yet-another-random-name', 'value'=>'K3'];
//NOTE: If other attributes exists I want to keep them
$prop=$dom->createElement('property',$datum['value']);
$prop->setAttribute('name', $datum['name']);
if(prop_list_contains($properties-node,$prop,['name']))
$properties_node->appendChild($prop);
else
echo "not adding element, found\n";
now I want to make
/**
#param $properties_node reference to the existing dom object
#param $prop the new element I want to add
#param $required_matches an array containing the name of the attributes that must match
#return matching element if match is found, false otherweise
*/
function prop_list_contains(&$properties_node,$prop,array $required_matches){
// here I have no Idea how to parse the document I have
return false
}
Desiderata:
not adding element, found

The easiest way I can think if is to use XPath to check if the node already exists.
Assuming that you will only use 1 element to match on (more is possible, but much more complicated). This first extracts the value from the new node and then uses XPath to check if a matching value already exists in the current data.
The main thing about this process is to ensure you use the correct context for the search. This is effectively what to search in, first this uses the new element, then the current one to check it.
function prop_list_contains(DOMXPath $xp, $properties_node, $prop,
array $required_matches){
// Extract value from new node
$compare = $xp->evaluate('string(#'.$required_matches[0].')', $prop);
// Check for the value in the existing data
$xpath = 'boolean(./property[#'. $required_matches[0] . ' = "' . $compare . '"])';
return ( $xp->evaluate($xpath, $properties_node) );
}
This will also mean you need to create the XPath object to pass in...
$xp = new DOMXPath($dom);
this saves creating it each time.
Also as this returns true if the node exists, you need to change your test to use !...
if( ! prop_list_contains($xp, $properties_node,$prop,['name'])) {
$properties_node->appendChild($prop);
}
else {
echo "not adding element, found\n";
}

Related

Searching an XML structure but modifying a node higher in the hierarchy

So as an example here is an MWE XML
<manifest xmlns="http://iuclid6.echa.europa.eu/namespaces/manifest/v1"
xmlns:xlink="http://www.w3.org/1999/xlink">
<general-information>
<title>IUCLID 6 container manifest file</title>
<created>Tue Nov 05 11:04:06 EET 2019</created>
<author>SuperUser</author>
</general-information>
<base-document-uuid>f53d48a9-17ef-48f0-8d0e-76d03007bdfe/f53d48a9-17ef-48f0-8d0e-76d03007bdfe</base-document-uuid>
<contained-documents>
<document id="f53d48a9-17ef-48f0-8d0e-76d03007bdfe/f53d48a9-17ef-48f0-8d0e-76d03007bdfe">
<type>DOSSIER</type>
<name xlink:type="simple"
xlink:href="f53d48a9-17ef-48f0-8d0e-76d03007bdfe_f53d48a9-17ef-48f0-8d0e-76d03007bdfe.i6d"
>Initial submission</name>
<first-modification-date>2019-03-27T06:46:39Z</first-modification-date>
<last-modification-date>2019-03-27T06:46:39Z</last-modification-date>
</document>
</contained-documents>
</manifest>
In this case I want to find an attribute xlink:href and replace the name tag with the contents of the file referred to by the xlink:href - in this case f53d48a9-17ef-48f0-8d0e-76d03007bdfe_f53d48a9-17ef-48f0-8d0e-76d03007bdfe.i6d (which is an XML format file as well).
At the moment I use simplexml to pull it into an object and then xml2json library to convert it into a recursive array - but walking it using the normal methods doesn't give me a way to modify a parent node..
I'm not sure how to back up the hierarchy - any suggestions??

So this is where I am right now - xml2array (https://github.com/tamlyn/xml2json) delivers an array of arrays with XML attributes brought out into the array too
<?php
include('./xml2json.php');
$arrayData = [];
$xmlOptions = array(
"namespaceRecursive" => "True"
);
function &i6cArray(& $array){
foreach ($array as $key => $value) {
if(is_array($value)){
//recurse the array of arrays
$value = &i6cArray($value);
$array[$key]=$value;
print_r($value);
} elseif ($key == '#xlink:href') {
// we want to replace the element here with the ref'd file contents
// So we should get name.content = file contents
$tempxml = simplexml_load_file($value);
$tempArrayData = xmlToArray($tempxml);
$array['content']=$tempArrayData;
} else {
//do nothing (at least for now)
}
}
return $array;
}
if (file_exists('manifest.xml')) {
$xml = simplexml_load_file('manifest.xml');
$arrayData = xmlToArray($xml,$xmlOptions);
// walk array - we know the initial thing is an array
$arrayData = &i6cArray($arrayData);
//output result
$jsonString = json_encode($arrayData, JSON_PRETTY_PRINT);
file_put_contents('dossier.json', $jsonString);
} else {
exit("Failed to open manifest.");
}
?>
Since I would have liked to remove the #xlink attributes, but won't die otherwise I am going to insert a 'content' value which will be the referenced XML content.
I would still link to have replaced the entire 'name' key with something

A few bits of background before we get into the specific solution:
The parts of names before a colon are local aliases for a particular namespace, identified by a URI in an xmlns attribute. They need slightly different handling than non-namespaced names; see this reference question for SimpleXML.
PHP's SimpleXML and DOM extensions both have support for a language called "XPath", which lets you search for elements and attributes based on their parents and/or content.
The DOM is a more complex API than SimpleXML, but has more powerful features, particularly for writing. You can switch between the two using the functions simplexml_import_dom() and dom_import_simplexml().
In this case, we want to find all xlink:href attributes. Looking at the xmlns attributes at the top of the file, we see these are in the http://www.w3.org/1999/xlink namespace. In XPath, you can say "has an attribute" with the syntax [#attributename], so we can use SimpleXML and XPath like this:
$simplexml->registerXpathNamespace('xl', 'http://www.w3.org/1999/xlink');
$elements_with_xlink_hrefs = $simplexml->xpath('//[#xl:href]');
For each of those, we want the attribute value:
foreach ( $elements_with_xlink_hrefs as $simplexml_element ) {
$filename = (string)$simplexml_element->attributes('http://www.w3.org/1999/xlink')->href;
// ...
We then want to load that file, and inject it into the document; this is easier with the DOM, but there is a complexity of having to "import" the node so that it's "owned by" the right document.
// load the other file
$other_document = new DOMDocument;
$other_document->load($filename);
// switch to DOM and add it in place
$dom_element = dom_import_simplexml($simplexml_element);
$dom_element->appendChild(
$dom_element->ownerDocument->importNode(
$other_document->documentElement
)
);
We can now tidy up and delete the "xlink" attributes:
$dom_element->removeAttributeNs('http://www.w3.org/1999/xlink', 'href');
$dom_element->removeAttributeNs('http://www.w3.org/1999/xlink', 'type');
Once we're done, we can output the whole thing back as one combined XML document:
} // end of foreach loop
echo $simplexml->asXML();

Unable to find tbl child nodes in MS Word doc using xpath or $node->children()

I have a project where I am attempting to take a docx file then unzip it, then run through the document.xml file using xpath in order find all table elements. then within each table element run through and identify specific tables using the tblCaption (Table Caption obviously) attribute and then run through the table and find table cells. Then I will change background color of cells by changing the w:fill value using a string replace. We're doing it like this because we want to manually enter tables into Word and then change the table without having to dynamically generate tables using a library like PHPDocx or otherwise. I have so far used SimpleXML with xpath to find all tables in the doc, loop through them and test for the existence and value of the tblCaption node. If there is a match I will then assign bg color to each cell using the cell text to id the cell node. I can find all the tables using xpath. I have attempted to find child nodes of each table using both $tblNode->children() and $xpath:
$xml = simplexml_load_file(APPPATH.TEMPLATE_UPLOAD_PATH.'xmltest/word/document.xml');
$namespaces = $xml->getDocNamespaces(true);
foreach ($namespaces as $prefix => $ns) {
$prefix = $prefix == '' ? 'default' : $prefix;
$xml->registerXPathNamespace($prefix, $ns);
}
$nodes = $xml->xpath("/w:document/w:body//w:tbl");
foreach($nodes as $node) {
$children = $node->xpath("/w:tblCaption");
echo count($children) . '<br />';
//$children = $node->children();
//echo count($children) . '<br />';
}
I would eventually like to use:
$children = $node->xpath("/w:tblCaption[#val='whatever']"); to return a tblCaption node only if it exists and has a specific value.
At the moment there are zero child nodes for each tbl node being returned.
Any ideas?

One of the problems that I can identify having no previlage to see sample XML is, that your code attempt to get element relative to other element by using absolute XPath expression (one that starts with /, which reference root document). You should, at least, add a . at the beginning of your XPath or remove the / completely. Both will result in an XPath that look for child element of certain name, relative to current $node :
foreach($nodes as $node) {
$children = $node->xpath("w:tblCaption");
echo count($children) . '<br />';
}

I've ditched this all together and I'm using the OpenTBS plugin for changing table appearance in Word. It's much more powerful than trying to mess around with xml elements yourself. I haven't tried the above idea. Buy thanks anway.

Traversing a DOM with PHP

I want to find in a specific site an element within the DOM.
Inside the DOM there is a tag called "cufon".
assume the url is http://www.xyzw.com/
The code i use is the following:
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.xyzw.com/');
$teams = $dom->getElementById('cufon');
at this point the $teams var suppose to contain all of the cufon elements inside the DOM but it contains nothing if i try to find for "div" elements it does find it all.
What is the problem?

If, as you say, there is a TAG called cufontext then trying to find a collection of nodes using one ID would only return one element ( IDs need to be unique ) so perhaps you want to find all elements of the specified tagname??
$dom = new DOMDocument();
$dom->loadHTMLFile('http://www.xyzw.com/');
$teams = $dom->getElementsByTagName('cufontext');
if( $teams ){
foreach($teams as $team){
/* do stuff */
}
}
As we have not been given the actual url involved I had to test like this:-
/* random url - just happened to be open in browser just now */
$url='http://www.interparcel.com/';
/* the tag to search for */
$tag='div';
$dom = new DOMDocument();
$dom->loadHTMLFile( $url );
$teams = $dom->getElementsByTagName( $tag );
/* As pointed out by #Pieter it would have always returned true so additional check */
if( $teams && $teams->length > 0 ){
foreach($teams as $team){
echo $team->nodeValue;
}
}
This will spit out lots of content from the remote url - so if you are unable to find a tag called cufontext I'd suggest confirming there are tags of that name

PHP return value after XML exploration

I got a PHP array with a lot of XML users-file URL :
$tab_users[0]=john.xml
$tab_users[1]=chris.xml
$tab_users[n...]=phil.xml
For each user a <zoom> tag is filled or not, depending if user filled it up or not:
john.xml = <zoom>Some content here</zoom>
chris.xml = <zoom/>
phil.xml = <zoom/>
I'm trying to explore the users datas and display the first filled <zoom> tag, but randomized: each time you reload the page the <div id="zoom"> content is different.
$rand=rand(0,$n); // $n is the number of users
$datas_zoom=zoom($n,$rand);
My PHP function
function zoom($n,$rand) {
global $tab_users;
$datas_user=new SimpleXMLElement($tab_users[$rand],null,true);
$tag=$datas_user->xpath('/user');
//if zoom found
if($tag[0]->zoom !='') {
$txt_zoom=$tag[0]->zoom;
}
... some other taff here
// no "zoom" value found
if ($txt_zoom =='') {
echo 'RAND='.$rand.' XML='.$tab_users[$rand].'<br />';
$datas_zoom=zoom($r,$n,$rand); } // random zoom fct again and again till...
}
else {
echo 'ZOOM='.$txt_zoom.'<br />';
return $txt_zoom; // we got it!
}
}
echo '<br />Return='.$datas_zoom;
The prob is: when by chance the first XML explored contains a "zoom" information the function returns it, but if not nothing returns... An exemple of results when the first one is by chance the good one:
// for RAND=0, XML=john.xml
ZOOM=Anything here
Return=Some content here // we're lucky
Unlucky:
RAND=1 XML=chris.xml
RAND=2 XML=phil.xml
// the for RAND=0 and XML=john.xml
ZOOM=Anything here
// content founded but Return is empty
Return=
What's wrong?

I suggest importing the values into a database table, generating a single local file or something like that. So that you don't have to open and parse all the XML files for each request.
Reading multiple files is a lot slower then reading a single file. And using a database even the random logic can be moved to SQL.
You're are currently using SimpleXML, but fetching a single value from an XML document is actually easier with DOM. SimpleXMLElement::xpath() only supports Xpath expression that return a node list, but DOMXpath::evaluate() can return the scalar value directly:
$document = new DOMDocument();
$document->load($xmlFile);
$xpath = new DOMXpath($document);
$zoomValue = $xpath->evaluate('string(//zoom[1])');
//zoom[1] will fetch the first zoom element node in a node list. Casting the list into a string will return the text content of the first node or an empty string if the list was empty (no node found).
For the sake of this example assume that you generated an XML like this
<zooms>
<zoom user="u1">z1</zoom>
<zoom user="u2">z2</zoom>
</zooms>
In this case you can use Xpath to fetch all zoom nodes and get a random node from the list.
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$zooms = $xpath->evaluate('//zoom');
$zoom = $zooms->item(mt_rand(0, $zooms->length - 1));
var_dump(
[
'user' => $zoom->getAttribute('user'),
'zoom' => $zoom->textContent
]
);

Your main issue is that you are not returning any value when there is no zoom found.
$datas_zoom=zoom($r,$n,$rand); // no return keyword here!
When you're using recursion, you usually want to "chain" return values on and on, till you find the one you need. $datas_zoom is not a global variable and it will not "leak out" outside of your function. Please read the php's variable scope documentation for more info.
Then again, you're calling zoom function with three arguments ($r,$n,$rand) while the function can only handle two ($n and $rand). Also the $r is undiefined, $n is not used at all and you are most likely trying to use the same $rand value again and again, which obviously cannot work.
Also note that there are too many closing braces in your code.
I think the best approach for your problem will be to shuffle the array and then to use it like FIFO without recursion (which should be slightly faster):
function zoom($tab_users) {
// shuffle an array once
shuffle($tab_users);
// init variable
$txt_zoom = null;
// repeat until zoom is found or there
// are no more elements in array
do {
$rand = array_pop($tab_users);
$datas_user = new SimpleXMLElement($rand, null, true);
$tag=$datas_user->xpath('/user');
//if zoom found
if($tag[0]->zoom !='') {
$txt_zoom=$tag[0]->zoom;
}
} while(!$txt_zoom && !empty($tab_users));
return $txt_zoom;
}
$datas_zoom = zoom($tab_users); // your zoom is here!
Please read more about php scopes, php functions and recursion.

There's no reason for recursion. A simple loop would do.
$datas_user=new SimpleXMLElement($tab_users[$rand],null,true);
$tag=$datas_user->xpath('/user');
$max = $tag->length;
while(true) {
$test_index = rand(0, $max);
if ($tag[$test_index]->zoom != "") {
break;
}
}
Of course, you might want to add a bit more logic to handle the case where NO zooms have text set, in which case the above would be an infinite loop.

PHP: How to change part of XML using DomElement

I am trying to make a function that changes part of an XML using XPath. I used part of someone else post:
/*********************************************************************
Function to replace part of an XML
**********************************************************************/
function replacePartofXML($element, $methodName, $methodValue, $xml, $newPartofXML)
{
$xpathstring = "//" . $element . "[#$methodName = \"$methodValue\"]";
$xml->xpath($xpathstring);
//$domToChange = dom_import_simplexml($xml->xpath($xpathstring));
$domToChange = dom_import_simplexml($xml);
$domReplace = dom_import_simplexml($newPartofXML);
$nodeImport = $domToChange->ownerDocument->importNode($domReplace, TRUE);
$domToChange->parentNode->replaceChild($nodeImport, $domToChange);
return($xml);
}
What I want to do is return the appended XML. I can't use dom_import_simplexml($xml->node->node) as my XML has many repeating element (but they have different ID reason why I am trying to use xpath)
The commented line does not work either as xpath returns an array and dom_import_simplexml is cannot import arrays.
Thanks for you input

You can take the first element returned by xpath() in case you believe the target element is unique (no-element-returned checking omitted) :
$domToChange = dom_import_simplexml($xml->xpath($xpathstring)[0]);
or iterate through the return value of xpath() and replace one by one.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Find if an element exists in XML using PHP DOM - php

Related

Searching an XML structure but modifying a node higher in the hierarchy

Unable to find tbl child nodes in MS Word doc using xpath or $node->children()

Traversing a DOM with PHP

PHP return value after XML exploration

PHP: How to change part of XML using DomElement

Categories

Resources