Comparing 2 XML Files using PHP - php

I want to compare 2 big xml files and retrieve the differences. Like ExamXML and DiffDog do. The solution I found was cycling through all child nodes of each file simultaneously and check if they are equal. But I have no idea how to achieve that... How can I loop through all child nodes and their properties? How can I check if the first element of the first file is equal to the first element of the second file, the second element of the first file is equal to the second element of the second file and so on?
Do yo have a better idea to compare 2 xml files?

I was looking for something to compare two XML like you, and I found this solution that works very well.
http://www.jevon.org/wiki/Comparing_Two_SimpleXML_Documents
I hope that helps to someone.

Have you looked at using XPath at all? Seems like an easy way to grab all of the child nodes. Then you'd be able to loop through the nodes and compare the attributes/textContent.

This might be a very alternative solution for you but this is how I would do it.
First, I'd try to get the format into something much more manageable like an array so I would convert the XML to an array.
http://www.bytemycode.com/snippets/snippet/445/
This is some simple code to do just that.
Then PHP has an array_diff() function that can show you the differences.
http://www.php.net/manual/en/function.array-diff.php
This may or may not work for you considering what you need to do with the differences but if you're looking to just identify and act upon them this might be a very quick solution to your problem.

Try the xmldiff extension
http://pecl.php.net/xmldiff
It's based on the same library as the perl module DifferenceMarkup, you'll get a diff XML document and can even merge then.

//Child by Child XML files comparison in PHP
//Returns an array of non matched children in variable &$reasons
$reasons = array();
$xml1 = new SimpleXMLElement(file_get_contents($xmlFile1));
$xml2 = new SimpleXMLElement(file_get_contents($xmlFile2));
$result = XMLFileComparison($xml1, $xml2, $reasons);
/**
* XMLFileComparison
* Discription :- This function compares XML files. Returns array
* of nodes do not match in pass by reference parameter
* #param $xml1 Object Node Object
* #param $xml2 Object Node Object
* #param &$reasons Array pass by reference
* returns array of nodes do not match
* #param $strict_comparison Bool default False
* #return bool <b>TRUE</b> on success or array of strings on failure.
*/
function XMLFileComparison(SimpleXMLElement $xml1, SimpleXMLElement $xml2, &$reasons, $strict_comparison = false)
{
static $str;
// compare text content
if ($strict_comparison) {
if ("$xml1" != "$xml2") return "Values are not equal (strict)";
} else {
if (trim("$xml1") != trim("$xml2"))
{
return " Values are not equal";
}
}
// get all children
$XML1ChildArray = array();
$XML2ChildArray = array();
foreach ($xml1->children() as $b) {
if (!isset($XML1ChildArray[$b->getName()]))
$XML1ChildArray[$b->getName()] = array();
$XML1ChildArray[$b->getName()][] = $b;
}
foreach ($xml2->children() as $b) {
if (!isset($XML2ChildArray[$b->getName()]))
$XML2ChildArray[$b->getName()] = array();
$XML2ChildArray[$b->getName()][] = $b;
}
//print_r($XML1ChildArray);
//print_r($XML2ChildArray);
// cycle over children
if (count($XML1ChildArray) != count($XML2ChildArray)) return "mismatched children count";// Second File has less or more children names (we don't have to search through Second File's children too)
foreach ($XML1ChildArray as $child_name => $children) {
if (!isset($XML2ChildArray[$child_name])) return "Second file does not have child $child_name"; // Second file has none of this child
if (count($XML1ChildArray[$child_name]) != count($XML2ChildArray[$child_name])) return "mismatched $child_name children count"; // Second file has less or more children
print_r($child_name);
foreach ($children as $child) {
// do any of search2 children match?
$found_match = false;
//$reasons = array();
foreach ($XML2ChildArray[$child_name] as $id => $second_child) {
$str = $str.$child_name.($id+1)."/"; // Adding 1 to $id to match with XML data nodes numbers
//print_r($child, $second_child);
// recursive function call until reach to the end of node
if (($r = XMLFileComparison($child, $second_child, $reasons, $strict_comparison)) === true) {
// found a match: delete second
$found_match = true;
unset($XML2ChildArray[$child_name][$id]);
$str = str_replace($child_name.($id+1)."/", "", $str);
break;
}
else {
unset($XML2ChildArray[$child_name][$id]);
$reasons[$str] = $r;
$str = str_replace($child_name.($id+1)."/", "", $str);
break;
}
}
}
}
return True;
}

Related

How to display unique results from an array with two conditions

In $data I have the following values:
Result
[{"id":2,"author":"example#mail.com","id_orders":502},
{"id":2,"author":"example#mail.com","id_orders":503},
{"id":2,"author":"example#mail.com","id_orders":505},
{"id":3,"author":"second-example#mail.com","id_orders":502},
{"id":3,"author":"second-example#mail.com","id_orders":503},
{"id":3,"author":"second-example#mail.com","id_orders":505},
{"id":4,"author":"third-example#mail.com","id_orders":502},
{"id":4,"author":"third-example#mail.com","id_orders":503},
{"id":4,"author":"third-example#mail.com","id_orders":505}]
I want unique results for id and id_orders. I want 3 out of these 9 results. I have tried this, but it helps on one id_orders condition.
PHP code
$result = json_decode($data, true);
$unique_array = [];
foreach($result as $element) {
$hash = $element['id_orders'];
$unique_array[$hash] = $element;
}
$data = array_values($unique_array);
Do you know how it can be different to make it work for two?
You can do it by keeping track of the values that were already used. Disclaimer: this solution will only produce a clear result for cases where the number of unique values for both criteria is the same.
$uniqueArray = [];
$usedValues = [
'id' => [],
'id_orders' => [],
];
foreach ($result as $element) {
if (!in_array($element['id'], $usedValues['id']) && !in_array($element['id_orders'], $usedValues['id_orders'])) {
$uniqueArray[] = $element;
$usedValues['id'][] = $element['id'];
$usedValues['id_orders'][] = $element['id_orders'];
}
}
Basically, what's happening here is that we're using $usedValues to store all the unique values we've already used and comparing against it using in_array. When we iterate through the objects, any object with an id or id_orders that has already been used will be skipped. The pairings will be done in order of appearance in the array.
I've gone an extra mile to try and make this code a bit more generic:
* Finds elements with a unique combination of values under given keys.
*
* Assumes all elements in the array are arrays themselves and that the
* subarrays have the same structure. Assumes subarray elements are not
* objects (uses strict comparison).
*/
function uniqueCombination(array $arrayToFilter, array $keysToFilterOn): array
{
if (empty($arrayToFilter) || empty($keysToFilterOn)) {
throw new \InvalidArgumentException(
'Parameters of uniqueCombination must not be empty arrays'
);
}
// get the keys from the first element; others are assumed to be the same
$keysOfElements = array_keys(reset($arrayToFilter));
$keysPresentInBoth = array_intersect($keysToFilterOn, $keysOfElements);
// no point in running the algorithm if none of the keys are
// actually found in our array elements
if (empty($keysPresentInBoth)) {
return [];
}
$result = [];
$usedValues = array_combine(
$keysPresentInBoth,
array_fill(0, count($keysPresentInBoth), [])
);
foreach ($arrayToFilter as $element) {
if (!isAlreadyUsed($usedValues, $element)) {
$result[] = $element;
foreach ($keysPresentInBoth as $keyToUse) {
$usedValues[$keyToUse][] = $element[$keyToUse];
}
}
}
return $result;
}
function isAlreadyUsed(array $usedValues, array $element): bool
{
foreach ($usedValues as $usedKey => $usedValue) {
if (in_array($element[$usedKey], $usedValue)) {
return true;
}
}
return false;
}
In its core, this is the same algorithm, but made dynamic. It allows a variable number of keys to filter on (that's why they're passed separately as an argument), so the $usedValues array is created dynamically (using the first element's keys as its own keys, filled with empty arrays) and all the keys must be compared in loops (hence the separate function to check if an element's value had already been used).
It could probably be tweaked here or there as I haven't tested it thoroughly, but should provide satisfactory results for most structures.

converting a posted php array to a multidimensional array [duplicate]

So I wrote a class that can parse XML documents and create SQL queries from it to update or insert new rows depending on the settings.
Because the script has to work with any amount of nested blocks, the array that I'm putting all the values in has its path dynamically created much like the following example:
$path = array('field1','field2');
$path = "['".implode("']['",$path)."']";
eval("\$array".$path."['value'] = 'test';");
Basically $path contains an array that shows how deep in the array we currently are, if $path contains for instance the values main_table and field I want set $array['main_table']['field']['value'] to 'test'
As you can see I am currently using eval to do this, and this works fine. I am just wondering if there is a way to do this without using eval.
Something like
$array{$path}['value'] = 'test'; but then something that actually works.
Any suggestions?
EDIT
The reason I'm looking for an alternative is because I think eval is bad practice.
SECOND EDIT
Changed actual code to dummy code because it was causing a lot of misunderstandings.
Use something like this:
/**
* Sets an element of a multidimensional array from an array containing
* the keys for each dimension.
*
* #param array &$array The array to manipulate
* #param array $path An array containing keys for each dimension
* #param mixed $value The value that is assigned to the element
*/
function set_recursive(&$array, $path, $value)
{
$key = array_shift($path);
if (empty($path)) {
$array[$key] = $value;
} else {
if (!isset($array[$key]) || !is_array($array[$key])) {
$array[$key] = array();
}
set_recursive($array[$key], $path, $value);
}
}
You can bypass the whole counter business with the array append operator:
$some_array[] = 1; // pushes '1' onto the end of the array
As for the whole path business, I'm assuming that's basically an oddball representation of an xpath-like route through your xml document... any reason you can't simply use that string as an array key itself?
$this->BLOCKS['/path/to/the/node/you're/working/on][] = array('name' => $name, 'target' => $target);
You can use a foreach with variable variables.
// assuming a base called $array, and the path as in your example:
$path = array('field1','field2');
$$path = $array;
foreach ($path as $v) $$path = $$path[$v];
$$path['value'] = 'test';
Short, simple, and much better than eval.

PHP Parse HTML file [duplicate]

I'm parsing some XML with PHP DOM extension in order to store the data in some other form. Quite unsurprisingly, when I parse an element I pretty often need to obtain all children elements of some name. There is the method DOMElement::getElementsByTagName($name), but it returns all descendants with that name, not just immediate children. There is also the property DOMNode::$childNodes but (1) it contains node list, not element list, and even if I managed to turn the list items into elements (2) I'd still need to check all of them for the name. Is there really no elegant solution to get only the children of some specific name or am I missing something in the documentation?
Some illustration:
<?php
DOMDocument();
$document->loadXML(<<<EndOfXML
<a>
<b>1</b>
<b>2</b>
<c>
<b>3</b>
<b>4</b>
</c>
</a>
EndOfXML
);
$bs = $document
->getElementsByTagName('a')
->item(0)
->getElementsByTagName('b');
foreach($bs as $b){
echo $b->nodeValue . "\n";
}
// Returns:
// 1
// 2
// 3
// 4
// I'd like to obtain only:
// 1
// 2
?>
simple iteration process
$parent = $p->parentNode;
foreach ( $parent->childNodes as $pp ) {
if ( $pp->nodeName == 'p' ) {
if ( strlen( $pp->nodeValue ) ) {
echo "{$pp->nodeValue}\n";
}
}
}
An elegant manner I can imagine would be using a FilterIterator that is suitable for the job. Exemplary one that is able to work on such a said DOMNodeList and (optionally) accepting a tagname to filter for as an exemplary DOMElementFilter from the Iterator Garden does:
$a = $doc->getElementsByTagName('a')->item(0);
$bs = new DOMElementFilter($a->childNodes, 'b');
foreach($bs as $b){
echo $b->nodeValue . "\n";
}
This will give the results you're looking for:
1
2
You can find DOMElementFilter in the Development branch now. It's perhaps worth to allow * for any tagname as it's possible with getElementsByTagName("*") as well. But that's just some commentary.
Hier is a working usage example online: https://eval.in/57170
My solution used in a production:
Finds a needle (node) in a haystack (DOM)
function getAttachableNodeByAttributeName(\DOMElement $parent = null, string $elementTagName = null, string $attributeName = null, string $attributeValue = null)
{
$returnNode = null;
$needleDOMNode = $parent->getElementsByTagName($elementTagName);
$length = $needleDOMNode->length;
//traverse through each existing given node object
for ($i = $length; --$i >= 0;) {
$needle = $needleDOMNode->item($i);
//only one DOM node and no attributes specified?
if (!$attributeName && !$attributeValue && 1 === $length) return $needle;
//multiple nodes and attributes are specified
elseif ($attributeName && $attributeValue && $needle->getAttribute($attributeName) === $attributeValue) return $needle;
}
return $returnNode;
}
Usage:
$countryNode = getAttachableNodeByAttributeName($countriesNode, 'country', 'iso', 'NL');
Returns DOM element from parent countries node by specified attribute iso using country ISO code 'NL', basically like a real search would do. Find a certain country by it's name in an array / object.
Another usage example:
$productNode = getAttachableNodeByAttributeName($products, 'partner-products');
Returns DOM node element containing only single (root) node, without searching by any attribute.
Note: for this you must make sure that root nodes are unique by elements' tag name, e.g. countries->country[ISO] - countries node here is unique and parent to all child nodes.

PHP DOM: How to get child elements by tag name in an elegant manner?

I'm parsing some XML with PHP DOM extension in order to store the data in some other form. Quite unsurprisingly, when I parse an element I pretty often need to obtain all children elements of some name. There is the method DOMElement::getElementsByTagName($name), but it returns all descendants with that name, not just immediate children. There is also the property DOMNode::$childNodes but (1) it contains node list, not element list, and even if I managed to turn the list items into elements (2) I'd still need to check all of them for the name. Is there really no elegant solution to get only the children of some specific name or am I missing something in the documentation?
Some illustration:
<?php
DOMDocument();
$document->loadXML(<<<EndOfXML
<a>
<b>1</b>
<b>2</b>
<c>
<b>3</b>
<b>4</b>
</c>
</a>
EndOfXML
);
$bs = $document
->getElementsByTagName('a')
->item(0)
->getElementsByTagName('b');
foreach($bs as $b){
echo $b->nodeValue . "\n";
}
// Returns:
// 1
// 2
// 3
// 4
// I'd like to obtain only:
// 1
// 2
?>
simple iteration process
$parent = $p->parentNode;
foreach ( $parent->childNodes as $pp ) {
if ( $pp->nodeName == 'p' ) {
if ( strlen( $pp->nodeValue ) ) {
echo "{$pp->nodeValue}\n";
}
}
}
An elegant manner I can imagine would be using a FilterIterator that is suitable for the job. Exemplary one that is able to work on such a said DOMNodeList and (optionally) accepting a tagname to filter for as an exemplary DOMElementFilter from the Iterator Garden does:
$a = $doc->getElementsByTagName('a')->item(0);
$bs = new DOMElementFilter($a->childNodes, 'b');
foreach($bs as $b){
echo $b->nodeValue . "\n";
}
This will give the results you're looking for:
1
2
You can find DOMElementFilter in the Development branch now. It's perhaps worth to allow * for any tagname as it's possible with getElementsByTagName("*") as well. But that's just some commentary.
Hier is a working usage example online: https://eval.in/57170
My solution used in a production:
Finds a needle (node) in a haystack (DOM)
function getAttachableNodeByAttributeName(\DOMElement $parent = null, string $elementTagName = null, string $attributeName = null, string $attributeValue = null)
{
$returnNode = null;
$needleDOMNode = $parent->getElementsByTagName($elementTagName);
$length = $needleDOMNode->length;
//traverse through each existing given node object
for ($i = $length; --$i >= 0;) {
$needle = $needleDOMNode->item($i);
//only one DOM node and no attributes specified?
if (!$attributeName && !$attributeValue && 1 === $length) return $needle;
//multiple nodes and attributes are specified
elseif ($attributeName && $attributeValue && $needle->getAttribute($attributeName) === $attributeValue) return $needle;
}
return $returnNode;
}
Usage:
$countryNode = getAttachableNodeByAttributeName($countriesNode, 'country', 'iso', 'NL');
Returns DOM element from parent countries node by specified attribute iso using country ISO code 'NL', basically like a real search would do. Find a certain country by it's name in an array / object.
Another usage example:
$productNode = getAttachableNodeByAttributeName($products, 'partner-products');
Returns DOM node element containing only single (root) node, without searching by any attribute.
Note: for this you must make sure that root nodes are unique by elements' tag name, e.g. countries->country[ISO] - countries node here is unique and parent to all child nodes.

iterating over unknown XML structure with PHP (DOM)

I want to write a function that parses a (theoretically) unknown XML data structure into an equivalent PHP array.
Here is my sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<content>
<title>Sample Text</title>
<introduction>
<paragraph>This is some rudimentary text</paragraph>
</introduction>
<description>
<paragraph>Here is some more text</paragraph>
<paragraph>Even MORE text</paragraph>
<sub_section>
<sub_para>This is a smaller, sub paragraph</sub_para>
<sub_para>This is another smaller, sub paragraph</sub_para>
</sub_section>
</description>
</content>
I modified this DOM iterating function from devarticles:
$data = 'path/to/xmldoc.xml';
$xmlDoc = new DOMDocument(); #create a DOM element
$xmlDoc->load( $data ); #load data into the element
$xmlRoot = $xmlDoc->firstChild; #establish root
function xml2array($node)
{
if ($node->hasChildNodes())
{
$subNodes = $node->childNodes;
foreach ($subNodes as $subNode)
{
#filter node types
if (($subNode->nodeType != 3) || (($subNode->nodeType == 3)))
{
$arraydata[$subNode->nodeName]=$subNode->nodeValue;
}
xml2array($subNode);
}
}
return $arraydata;
}
//The getNodesInfo function call
$xmlarray = xml2array($xmlRoot);
// print the output - with a little bit of formatting for ease of use...
foreach($xmlarray as $xkey)
{
echo"$xkey<br/><br/>";
}
Now, because of the way I'm passing the elements to the array I'm overwriting any elements that share a node name (since I ideally want to give the keys the same names as their originating nodes). My recursion isn't great... However, even if I empty the brackets - the second tier of nodes are still coming in as values on the first tier (see the text of the description node).
Anyone got any ideas how I can better construct this?
You might be better off just snagging some code off the net
http://www.bin-co.com/php/scripts/xml2array/
/**
* xml2array() will convert the given XML text to an array in the XML structure.
* Link: http://www.bin-co.com/php/scripts/xml2array/
* Arguments : $contents - The XML text
* $get_attributes - 1 or 0. If this is 1 the function will get the attributes as well as the tag values - this results in a different array structure in the return value.
* $priority - Can be 'tag' or 'attribute'. This will change the way the resulting array sturcture. For 'tag', the tags are given more importance.
* Return: The parsed XML in an array form. Use print_r() to see the resulting array structure.
* Examples: $array = xml2array(file_get_contents('feed.xml'));
* $array = xml2array(file_get_contents('feed.xml', 1, 'attribute'));
*/
function xml2array($contents, $get_attributes=1, $priority = 'tag') {
You might be interested in SimpleXML or xml_parse_into_struct.
$arraydata is neither passed to subsequent calls to xml2array() nor is the return value used, so yes "My recursion isn't great..." is true ;-)
To append a new element to an existing array you can use empty square brackets, $arr[] = 123; $arr[$x][] = 123;
You might also want to check out XML Unserializer
http://pear.php.net/package/XML_Serializer/redirected

Categories