I'm trying to convert some XML files I have to CSV using PHP SimpleXML class. However, I'm unable to achieve the result I want, because one parent could have several child elements with the same name. My current XML file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<club>
<name>Green Riders</name>
<membership>Free</membership>
<boardMember>
<name>James F.</name>
<position>CEO</position>
</boardMember>
<boardMember>
<name>Helen D.</name>
<position>Associate Director</position>
</boardMember>
</club>
<club>
<name>Broken Dice</name>
<membership>Paid</membership>
<boardMember>
<name>Patrick B.</name>
<position>CEO</position>
</boardMember>
</club>
</root>
The CSV output I was hoping to achieve is as such:
club,name,membership,boardMember>Name,boardMember>position
Green Riders,Free,James F.,CEO
Green Riders,Free,Helen D., Associate Director
Broken Dice,Paid,Patrick B., CEO
Is there anyway to achieve this without hard-coding the element names into the script (i.e. make it work on any generic XML file)?
I'm really hoping this is possible, given that I'll be having more than 25 XML variants; so would really be inefficient to write a dedicated script for each.
Thanks!
Since every child node's data need to be a row in the csv including the root root data, First you can capture & store the root data, then traverse the children and print their data with the root's data preceding them.
Please check the following code:
$xml = simplexml_load_file("your_xml_file.xml") or die("Error: Cannot create object");
$csv_delimeter = ",";
$csv_new_line = "\n";
foreach($xml->children() as $n) {
$club_data = array();
$club_data[] = $n->name;
$club_data[] = $n->membership;
if (isset($n->boardMember)) {
foreach ($n->boardMember as $boardMember) {
$boardMember_data = $club_data;
$boardMember_data[] = $boardMember->name;
$boardMember_data[] = $boardMember->position;
echo implode($csv_delimeter, $boardMember_data).$csv_new_line;
}
}
else {
echo implode($csv_delimeter, $club_data).$csv_new_line;
}
}
After testing with the example xml data, it generated the following type of output:
Green Riders,Free,James F.,CEO
Green Riders,Free,Helen D., Associate Director
Broken Dice,Paid,Patrick B., CEO
You can set different values based on your scenario for:
$csv_delimeter = ",";
$csv_new_line = "\n";
As there are no strict rules in csv output - like delimeter can be ",", ",", ";" or "|" and also new line can be "\n\r"
The codes prints csv rows one-by-one on the fly, but if you are to save csv data in a file, then instead of writing rows one-by-one, better approach would be create the entire array and write it once(as disk access is costly) unless the xml data is large. You will get plenty of simple php array-to-csv function examples in the net.
It is not really possible. XML is a nested structure and you miss the information. You can define some default mapping for XML structures, but that gets really complex really fast. So it is far easier (and less time consuming) to define the mapping by hand.
A Reusable Conversion
function readXMLAsRecords(string $xml, array $map) {
// load the xml
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
// iterate the elements defining the rows
foreach ($xpath->evaluate($map['row']) as $row) {
$line = [];
// get the field values from the current $row
foreach ($map['columns'] as $name => $expression) {
$line[$name] = $xpath->evaluate($expression, $row);
}
// return a line
yield $line;
}
}
The Mapping
With DOMXpath::evaluate() Xpath expressions can return strings. So we need one expression that returns the boardMember nodes and a list of expressions for the fields.
$map = [
'row' => '/root/club/boardMember',
'columns' => [
'club_name' => 'string(parent::club/name)',
'club_membership' => 'string(parent::club/membership)',
'board_member_name' => 'string(name)',
'board_member_position' => 'string(position)'
]
];
To CSV
readXMLAsRecords() returns a generator, you can use foreach on it:
$csv = fopen('php://stdout', 'w');
fputcsv($csv, array_keys($map['columns']));
foreach (readXMLAsRecords($xml, $map) as $record) {
fputcsv($csv, $record);
}
Output:
club_name,club_membership,board_member_name,board_member_position
"Green Riders",Free,"James F.",CEO
"Green Riders",Free,"Helen D.","Associate Director"
"Broken Dice",Paid,"Patrick B.",CEO
I'm looping through a directory, trying to find XML files with errors.
$baddies = array();
foreach (glob("fonts/*.svg") as $filename) {
libxml_use_internal_errors(true);
$str = file_get_contents($filename);
$sxe = simplexml_load_string($str);
$errors = libxml_get_errors();
$num_of_errors = 0;
$num_of_errors = sizeof($errors);
if ($num_of_errors > 0){
array_unshift($baddies, $filename);
}
}
However it seems that once the errors are put into this object, they persist there through subsequent iterations of the loop, and files without errors still test positive. $num_of_errors remains high for good files. I have it being reset to zero, and have even tried unseting it after each time through the loop. I suppose libxml_get_errors continues to retain a value once set. How can I reset it?
I think you should use libxml_clear_errors function. As per document here it says, the function keeps the errors stored in buffer.
I am not sure if this is even possible but I am trying to extract all the anchor tag links in a few HTML files on my website. I have currently written a php script that scans a few directories and sub directories that builds an array of HTML file links. Here is that code:
$di = new RecursiveDirectoryIterator('Migration');
$migrate = array();
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
if (eregi("\.html",$file) || eregi("\.htm",$file) ) {
$migrate[] .= $filename;
}
}
This method successfully produces the HTML File links that I need. Ex:
Migration/administration/billing/Billing.htm
Migration/administration/billing/_notes/Billing.htm.mno
Migration/administration/new business/_notes/New Business.htm.mno
Migration/administration/new business/New Business.htm
Migration/account/nycds/_notes/NYCDS Index.htm.mno
Migration/account/nycds/NYCDS Index.htm
There's more links but this gives you an idea. The next part is where I am stuck. I was thinking that I would need a for loop to loop through each array element, open the file, extract the links, then store those links somewhere. I am just not sure how I would go about this process. I tried to google this question but I never seemed to get results that matched what I was looking to do. Here is the simplified for loop that I have.
var obj = <?php echo json_encode($migrate); ?>;
for(var i=0;i< obj.length;i++){
// alert(obj[i]);
}
The above code is in javascript. From what I am reading, It seems that I shouldn't be using javascript but should maybe continue using PHP. I am confused on what my next steps should be. If someone can point me in the right direction I would really appreciate it. Thank you so much for your time.
Use DOMDocument::getElementsByTagName to retrieve all <a> tags
http://www.php.net/manual/en/domdocument.getelementsbytagname.php
Example,
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
$anchors = $doc->getElementsByTagName('a'); //retrieve all anchor tags
foreach ($anchors as $a) { //loop anchors
echo $a->nodeValue;
}
I'm retrieving files like so (from the Internet Archive):
<files>
<file name="Checkmate-theHumanTouch.gif" source="derivative">
<format>Animated GIF</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>72ec7fcf240969921e58eabfb3b9d9df</md5>
<mtime>1274063536</mtime>
<size>377534</size>
<crc32>b2df3fc1</crc32>
<sha1>211a61068db844c44e79a9f71aa9f9d13ff68f1f</sha1>
</file>
<file name="CheckmateTheHumanTouch1961.thumbs/Checkmate-theHumanTouch_000001.jpg" source="derivative">
<format>Thumbnail</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>6f6b3f8a779ff09f24ee4cd15d4bacd6</md5>
<mtime>1274063133</mtime>
<size>1169</size>
<crc32>657dc153</crc32>
<sha1>2242516f2dd9fe15c24b86d67f734e5236b05901</sha1>
</file>
</files>
They can have any number of <file>s, and I'm solely looking for the ones that are thumbnails. When I find them, I want to increase a counter. When I've gone through the whole file, I want to find the middle Thumbnail and return the name attribute.
Here's what I've got so far:
//pop previously retrieved XML file into a variable
$elem = new SimpleXMLElement($xml_file);
//establish variable
$i = 0;
// Look through each parent element in the file
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){$i++;}
}
//find the middle thumbnail.
$chosenThumb = ceil(($i/2)-1);
//Gloriously announce the name of the chosen thumbnail.
echo($elem->file[$chosenThumb]['name']);`
The final echo doesn't work because it doesn't like have a variable choosing the XML element. It works fine when I hardcode it in. Can you guess that I'm new to handling XML files?
Edit:
Francis Avila's answer from below sorted me right out!:
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$happy_string = (string)$middlethumb[name];
echo $happy_string;
Use XPath.
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$middlethumbname = (string) $middlethumb['name'];
You can also accomplish this with a single XPath expression if you don't need the total count:
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name');
$middlethumbname = (count($thumbs)) ? $thumbs[0]['name'] : '';
A limitation of SimpleXML's xpath method is that it can only return nodes and not simple types. This is why you need to use $thumbs[0]['name']. If you use DOMXPath::evaluate(), you can do this instead:
$doc = new DOMDocument();
$doc->loadXMLFile($url);
$xp = new DOMXPath($doc);
$middlethumbname = $xp->evaluate('string(/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name)');
$elem->file[$chosenThumb] will give the $chosenThumb'th element from the main file[] not the filtered(for Thumbnail) file[], right?
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){
$i++;
//add this item to a new array($filteredFiles)
}
}
$chosenThumb = ceil(($i/2)-1);
//echo($elem->file[$chosenThumb]['name']);
echo($filteredFiles[$chosenThumb]['name']);
Some problems:
Middle thumbnail is incorrectly calculated. You'll have to keep a separate array for those thumbs and get the middle one using count.
file might need to be {'file'}, I'm not sure how PHP sees this.
you don't have a default thumbnail
Code you should use is this one:
$files = new SimpleXMLElement($xml_file);
$thumbs = array();
foreach($files as $file)
if($file->format == "Thumbnail")
$thumbs[] = $file;
$chosenThumb = ceil((count($thumbs)/2)-1);
echo (count($thumbs)===0) ? 'default-thumbnail.png' : $thumbs[$chosenThumb]['name'];
/edit: but I recommend that guy's solution, to use XPath. Way easier.
I'm trying to edit some xml data. After this I want to save the data to file.
The problem is that the edited data isn't saved by simplexml but the node has changed.
$spieler = $xml->xpath("/planer/spieltag[#datum='" .$_GET['date']. "']/spielerliste/spieler");
for ( $i = 1; $i < 13; $i++ ){
if (!empty($_POST['spieler' .$i ])){
$spieler[$i-1] = $_POST['spieler' .$i];
}
}
var_dump($spieler);
$xml->asXML("data.xml");
var_dump() shows the new data, but asXML() doesn't.
Make sure your script has write permission to data.xml
The XPath result array elements aren't PHP ($ref = &$var) references to the actual tree nodes, so this line
$spieler[$i-1] = $_POST['spieler' .$i];
isn't modifying anything in the tree, you're simply overwriting an entry in a completely independent array.