Download and merge multiple XML files with PHP, Foreach and Dom - php

I'm breaking my tooths since a week with this problem : i'm trying to download and merge dynamicly multiples xml files with an API. I can download all the files but i can't merge them without having multiple roots elements... It's frustrating and i don t find any suggestion. Here is my code :
<?php
$fileout = 'file.xml';
unlink($fileout);
$baseurl="https://websitewithapi.com/";
$topcategories=array("COOL","DRIVE","FUN");
foreach ($topcategories as $topcategory) {
$url_cata_test="https://websitewithapi.com/&filters=topcategory:$topcategory&limit=1";
$jsontest = file_get_contents($url_cata_test);
$arrtest=json_decode($jsontest);
$items=$arrtest->pagination->count;
$pagemax=ceil($items/250);
$pagetest= range(0,$pagemax);
foreach ($pagetest as $page) {
$url_cata="$baseurl&filters=topcategory:$topcategory&offset=$page&limit=250";
echo "Cat en cours d import: ".$topcategory."\n";
echo "Page en cours d import: ".$page."\n";
echo "URL Cata: $url_cata \n";
};
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('superdeals'));
$files= array($url_cata);
foreach ($files as $filename) {
$addDom = new DOMDocument();
$addDom->load($filename);
if ($addDom->documentElement->getElementsByTagName('products')) {
foreach ($addDom->documentElement->getElementsByTagName('product') as $node) {
$dom->documentElement->appendChild(
$dom->importNode($node, TRUE)
);
}
}
$dom->formatOutput = true;
file_put_contents($fileout, $dom->saveXML(), FILE_APPEND);
}
};
?>
I got always the same problem with "associate" files in the same file but with multiple roots ! Is there a thing i miss ?
Thank you.

Simply initialize the DOM object and save its file output outside of all four foreach loops. Currently you are using FILE APPEND for each iteration which is not an XML DOM method but simply concatenates text content. Continue to grow your XML tree within the loops and then output the singular XML once without any file appends.
$fileout = 'file.xml';
unlink($fileout);
// INITIALIZE DOM TREE
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('superdeals'));
...
foreach ($topcategories as $topcategory) {
...
foreach ($pagetest as $page) {
...
foreach ($files as $filename) {
...
foreach ($addDom->documentElement->getElementsByTagName('product') as $node) {
$dom->documentElement->appendChild(
$dom->importNode($node, TRUE)
);
}
}
}
}
// OUTPUT DOM TREE
file_put_contents($fileout, $dom->saveXML());

It looks like you are adding root element called "superdeals" to your document, and then adding the contents of each file at the root level.
You need to add the contents of each file as a child of the "superdeals" element, not as a child of the document.
Save the root node:
$root = $dom->appendChild($dom->createElement('superdeals'));
then instead of
$dom->documentElement->appendChild($dom->importNode($node, TRUE))
add child to the root node (not the document node):
$root->appendChild($dom->importNode($node, TRUE))
Document element can contain nodes apart from the root element, such as entity definitions, processing instructions and so on.

Related

How to get element count in multiple xml files in a folder using PHP?

The following php script gives count of elements in a single xml file in the folder uploads. But I have number of xml files in the folder. What to modify in the following script so that I get result in tabular format with the file name and element count for all the xml files in the folder.
<?php
$doc = new DOMDocument;
$xml = simplexml_load_file("uploads/test.xml");
//file to SimpleXMLElement
$xml = simplexml_import_dom($xml);
print("Number of elements: ".$xml->count());
?>
You're first loading the XML file into a SimpleXMLElement then import it into a DOMElement and call the method count() on it. This method does not exists on DOMElement - only on SimpleXMLElement. So the import would not be necessary.
You can use a GlobIterator to iterate the files:
$directory = __DIR__.'/uploads';
// get an iterator for the XML files
$files = new GlobIterator(
$directory.'/*.xml', FilesystemIterator::CURRENT_AS_FILEINFO
);
$results = [];
foreach ($files as $file) {
// load file using absolute file path
// the returned SimpleXMLElement wraps the document element node
$documentElement = simplexml_load_file($file->getRealPath());
$results[] = [
// file name without path
'file' => $file->getFilename(),
// "SimpleXMLElement::count()" returns the number of children of an element
'item-count' => $documentElement->count(),
];
}
var_dump($results);
With DOM you can use Xpath to fetch specific values from the XML.
$directory = __DIR__.'/uploads';
// get an iterator for the XML files
$files = new GlobIterator(
$directory.'/*.xml', FilesystemIterator::CURRENT_AS_FILEINFO
);
// only one document instance is needed
$document = new DOMDocument();
$results = [];
foreach ($files as $file) {
// load the file into the DOM document
$document->load($file->getRealPath());
// create an Xpath processor for the loaded document
$xpath = new DOMXpath($document);
$results[] = [
'file' => $file->getFilename(),
// use an Xpath expression to fetch the value
'item-count' => $xpath->evaluate('count(/*/*)'),
];
}
var_dump($results);
The Xpath Expression
Get the document element /*
Get the child elements of the document element /*/*
Count them count(/*/*)
* is an universal selector for any element node. If you can you should be more specific and use the actual element names (e.g. /list/item).
First, create a function with the logic you have:
function getXML($path) {
$doc = new DOMDocument;
$xml = simplexml_load_file($path);
//file to SimpleXMLElement
$xml = simplexml_import_dom($xml);
return $xml;
}
Note that I:
have converted the path into a parameter, so you can reuse the same logic for your files
separated the parsing of XML from showing it
returned the XML itself, so you can get the count or you can do whatever else you may want with it
This is how you can get the files of a given path:
$files = array_diff(scandir('uploads'), array('.', '..'));
we get all files except for . and .., which are surely not of interest here. Read more about scandir here: https://www.php.net/manual/en/function.scandir.php
You received an array of filenames on success, so, let's loop it and perform the logic you need:
$xmls = [];
foreach ($files as $file) {
if (str_ends_with($file, '.xml')) {
$xmls[] = $file . "\t" . getXML('uploads/' . $file)->count();
}
}
echo implode("\n", $xmls);
EDIT
As #Juan kindly explained in the comment section, one can use
$files = glob("./uploads/*.xml");
instead of scandir and that would ensure that we no longer need a call for array_diff and later we can avoid the if inside the loop:
$xmls = [];
foreach ($files as $file) {
$xmls[] = $file . "\t" . getXML('uploads/' . $file)->count();
}
echo implode("\n", $xmls);

Loop through contents of folder using file_get_html?

Is it possible to loop through a folder using file_get_html for each file?
I'm currently using the following:
$html = file_get_html('http://example.local/folder/file1.html');
Is it possible to set the path to a folder, then it loops through the folders contents?
$html = file_get_html('http://example.local/folder/');
The files within the folder could be named anything (there's no set naming convention!) but they will always be html files.
I'm using simple_html_dom.php to get the HTML.
This would be the general idea:
$source = '/some/local/folder/';
foreach (new DirectoryIterator($source) as $fileInfo) {
if($fileInfo->isDot()) continue;
$html = file_get_contents($source.$fileInfo->getFilename());
//do stuff with $html
}

DOMDocument::save XML(): Memory allocation failed : growing buffer

I receive this error while i am trying to combine my xml files.I read other questions and answers put i could not find any solution for my code. I cannot increase ram of computer. Here is my code
public function mergeXml ($filename,$source){
$events = array();
// open each xml file in this directory
foreach(glob("$source/*.xml") as $files) {
// get the contents of the the current file
$events[] =$files; // throw all files into an array .
}
// Replace the strings below with the actual filenames, add or decrease as fit
$out = new \DOMDocument();
$root = $out->createElement("documents");
foreach ($events as $file) { //get each file from array
$obj = new \DOMDocument();
$obj->load($file); //load files to obj.
$xpath = new \DOMXPath($obj);
foreach ($xpath->query("/*/node()") as $node)
$root->appendChild($out->importNode($node, true)); }
$out->appendChild($root);
file_put_contents("$source/$filename.xml",$out->saveXML());

how do i edit compressed gz xml file, compress and save it back, with php? [duplicate]

So, I have this code that searches for a particular node in my XML file, unsets an existing node and inserts a brand new child node with the correct data. Is there a way of getting this new data to save within the actual XML file with simpleXML? If not, is there another efficient method for doing this?
public function hint_insert() {
foreach($this->hints as $key => $value) {
$filename = $this->get_qid_filename($key);
echo "$key - $filename - $value[0]<br>";
//insert hint within right node using simplexml
$xml = simplexml_load_file($filename);
foreach ($xml->PrintQuestion as $PrintQuestion) {
unset($xml->PrintQuestion->content->multichoice->feedback->hint->Passage);
$xml->PrintQuestion->content->multichoice->feedback->hint->addChild('Passage', $value[0]);
echo("<pre>" . print_r($PrintQuestion) . "</pre>");
return;
}
}
}
Not sure I understand the issue. The asXML() method accepts an optional filename as param that will save the current structure as XML to a file. So once you have updated your XML with the hints, just save it back to file.
// Load XML with SimpleXml from string
$root = simplexml_load_string('<root><a>foo</a></root>');
// Modify a node
$root->a = 'bar';
// Saving the whole modified XML to a new filename
$root->asXml('updated.xml');
// Save only the modified node
$root->a->asXml('only-a.xml');
If you want to save the same, you can use dom_import_simplexml to convert to a DomElement and save:
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($simpleXml->asXML());
echo $dom->saveXML();

Not able to update word/rels/document.xml.rels using PHP ZipArchive

I am trying to load binary data as images into Word documents (Opem XML) using PHP for later usage with XSLT.
After opening the Word document as a PHP ZipArchive, I am able to load images into the word/media folder succesfully and also update the word/document.xml file. But I am unable to update the <Relationships/> in the word/rels/document.xml.rels file.
I have already cross-checked the xml is in the correct format.
The following is the code snippet I am trying to use,
$zipArchive=new ZipArchive();
$zipArchive->open($pathToDoc);
$imagePre="image";
$relIdPre="rId";
$index=100;
$nodeList = $reportDOM->getElementsByTagName("Node");
$i=0;
foreach($nodeList as $node) {
$divList = $node->getElementsByTagName("*");
foreach ($divList as $divNode) {
if (strncasecmp($divNode->nodeName, "wizChart", 8) == 0) {
$imgData=$divNode->getAttribute("src");
$imgData=base64_decode(substr($imgData,22));
$zipArchive->
addFromString("word/media/".$imagePre."".$index.".png",$imgData);
$fp=$zipArchive->getStream("word/_rels/document.xml.rels");
$contents='';
while (!feof($fp)) {
$contents .= fread($fp, 2);
}
$serviceOutput=new DOMDocument();
$serviceOutput->loadXML($contents);
$serviceList=$serviceOutput->getElementsByTagName("Relationships");
$element=$serviceOutput->createElement("Relationship");
$element->setAttribute("Id",$relIdPre."".$index);
$element->setAttribute("Type","http://schemas.openxmlformats.org/officeDocument/2006/relationships/image");
$element->setAttribute("Target","word/media/".$imagePre."".$index.".png");
foreach ($serviceList as $serviceNode) {
$serviceNode->appendChild($element);
}
$zipArchive->addEmptyDir("word/_rels/");
$zipArchive->addFromString("word/_rels/document.xml.rels", $serviceOutput->saveXML());
$index++;
}
}
}
$zipArchive->close();
Could anyone suggest what I might be doing wrong?
You're adding a new content type as well when you add the PNG, so you need to set that in [Content_Types].xml. See Is it possible to add some data to a Word document? for more details.

Categories