I have some xml files, which have the same elements but only with different information.
First file test.xml
<?xml version="1.0" encoding="UTF-8"?>
<phones>
<phone>
<title>"Apple iPhone 5S"</title>
<price>
<regularprice>500</regularprice>
<saleprice>480</saleprice>
</price>
<color>black</color>
</phone>
</phones>
Second file test1.xml
<?xml version="1.0" encoding="UTF-8"?>
<phones>
<phone>
<title>Nokia Lumia 830</title>
<price>
<regularprice>400</regularprice>
<saleprice>370</saleprice>
</price>
<color>black</color>
</phone>
</phones>
I need to convert some values from these xml files into 1 test.csv file
So I am using this php code
<?php
$filexml1='test.xml';
$filexml2='test1.xml';
//File 1
if (file_exists($filexml1)) {
$xml = simplexml_load_file($filexml1);
$f = fopen('test.csv', 'w');
$headers = array('title', 'color');
$converted_array = array_map("strtoupper", $headers);
fputcsv($f, $converted_array, ',', '"');
foreach ($xml->phone as $phone) {
//$phone->title = trim($phone->title, " ");
// Array of just the components you need...
$values = array(
"title" => (string)$phone->title = trim(str_replace ( "\"", """, $phone->title ), " "),
"color" => (string)$phone->color
);
fputcsv($f, $values,',','"');
}
fclose($f);
echo "<p>File 1 coverted to .csv sucessfully</p>";
} else {
exit('Failed to open test.xml.');
}
//File 2
if (file_exists($filexml2)) {
$xml = simplexml_load_file($filexml2);
$f = fopen('test.csv', 'a');
//the same code for second file like for the first file
echo "<p>File 2 coverted to .csv sucessfully</p>";
} else {
exit('Failed to open test1.xml.');
}
?>
The output of the test.csv looks this way
TITLE COLOR
Apple iPhone 5S black
Nokia Lumia 830 black
As you can see I only managed to load each file into a variable and for each file I have to write if statement which makes the script too big, so I am wondering if it is possible to load all files into array, process them with one code block because xml elements are the same and output to one .csv file? Essentially I need the same test.csv output only with less php code.
Thanks in advance.
Next to using an array, there is more in PHP which can make it even more simple. Like an array could represent a list of your files, other constructs in PHP can that, too.
For example, as the XML files you have most likely are inside a specific directory and follow some pattern with their filename, those could be easily represented with a GlobIterator:
$inputFiles = new GlobIterator(__DIR__ . '/*.xml');
You could then foreach over them which I'll show in a moment with another example.
Such a list allows you to streamline your processing. That is important because there is some kind of a generic formular for many programs: Input, Process, Output. This is also called IPO or IPO+S Model. The S stands for storing. In your case while you process the input data, you also store into a new file CSV file which is also the output (after processing is fully done).
When you follow such a generic model, it's easier to structure your code and with a better structure you most often have less code. Even if not, each part of your code is more self-contained and smaller which is most often what you're looking for.
Next to the said list of XML-files I showed at the beginning of the answer with the GlobIterator there are other Iterators that can help to process the XML data.
For example, you've got 1-n XML files that contain 0-n <phone> elements. You know that you want to process any of these <phone> elements, you already exactly know what you want to do with them (extract some data from it). So wouldn't it be great to have a list of all <phone> elements within all XML-files first?
This can be easily done in PHP with the help of a Generator. That is a function that can return values multiple times while it's still "running". This is a simplification, better show some code to illustrate that. Let's say we've got the list of XML files as input and we want all <phone> elements out of it. For sure, you could create an array of all these <phone> elements and process that array later. However, a Generator is able to offer all these <phone> elements directly to be used within a foreach loop:
function extract_phones(Traversable $files) {
foreach ($files as $file) {
$xml = simplexml_load_file($file);
if ($xml === false) {
continue;
}
foreach ($xml->phone as $phone) {
yield $phone;
}
}
}
As this exemplary Generator function shows, it goes over all $files, tries to load them as a SimpleXMLElement and if successfull, iterates over all <phone> elements and yields them.
That means, if the function extract_phones is called within a foreach, that loop will have every <phone> element as SimpleXMLElement:
foreach(extract_phones($inputFiles) as $phone) {
# $phone is a SimpleXMLElement here
}
So now your question asks about creating the CSV file as output. This could be done creating an SplFileObject to pass the output around and access it while processing. It basically works the same like passing the file-handle around like you do in your question but it has better semantics that do allow to change the code more easily later on (you could replace it with another object that behaves the same).
Additionally I've seen a little detail in your code that is worth for some discussion first. You're encoding the quotes as HTML entities:
trim(str_replace( "\"", """, $phone->title ), " ")
You most likely do that because you want to have HTML-Entities inside the CSV file. However, the CSV file does not need such. You also want to have the data in the CSV file as generic as possible. Whether the CSV file is used inside a HTML context later on or within a spreadsheet application should not be your concern when you convert the file-format. My suggestion is here to leave that out and deal at another place with it. A place this more belongs to, and that is later on, e.g. if you use the data from the CSV creating some HTML.
That keeps your conversion and the data clean and it also removes detailed places in your processing which not only make the code more complicate but are very often a place where we introduce flaws into our programs.
I for myself will just remove it from my example.
So let's put this all together: Get all phones from all XML files and store the fields interested in into the output CSV file:
$files = new GlobIterator(__DIR__ . '/*.xml');
$phones = extract_phones($files);
$output = new SplFileObject('file.csv', 'w');
$output->fputcsv($header = ["title", "color"]);
foreach ($phones as $phone) {
$output->fputcsv(
[
$phone->title,
$phone->color,
]
);
}
This then creates the output file you're looking for (without the HTML-entities):
title,color
"""Apple iPhone 5S""",black
"Nokia Lumia 830",black
All this needs is the generator-function I've showed above already that in itself has also straight-forward code. Everything else ships with PHP already. Here is the example code in full:
<?php
/**
* #link http://stackoverflow.com/questions/26074850/convert-multiple-xml-files-to-csv-with-simplexml
*/
function extract_phones(Traversable $files)
{
foreach ($files as $file) {
$xml = simplexml_load_file($file);
if ($xml === false) {
continue;
}
foreach ($xml->phone as $phone) {
yield $phone;
}
}
}
$files = new GlobIterator(__DIR__ . '/*.xml');
$phones = extract_phones($files);
$output = new SplFileObject('file.csv', 'w');
$output->fputcsv($header = ["title", "color"]);
foreach ($phones as $phone) {
$output->fputcsv(
[
$phone->title,
$phone->color,
]
);
}
echo file_get_contents($output->getFilename());
Thanks #Ghost for pointing me to the right direction. So here is my solution.
<?php
$filexml = array ('test.xml', 'test1.xml');
//Headers
$fp = fopen('file.csv', 'w');
$headers = array('title', 'color');
$converted_array = array_map("strtoupper", $headers);
fputcsv($fp, $converted_array, ',', '"');
//XML
foreach ($filexml as $file) {
if (file_exists($file)) {
$xml = simplexml_load_file($file);
foreach ($xml->phone as $phone) {
$values = array(
"title" => (string)$phone->title = trim(str_replace ( "\"", """, $phone->title ), " "),
"color" => (string)$phone->color
);
fputcsv($fp, $values, ',', '"');
}
echo $file . ' converted to .csv sucessfully' . '<br>';
} else {
echo $file . ' was not found' . '<br>';
}
}
fclose($fp);
?>
Related
I've been using all sorts of hacks to generate file indexes out of SMB shares. And it's all cool with basic filepath plus metadata indexing.
The next step I want to implement is an algorithm combining some unix-like utilities and php, to index specific context from within files.
Now the first step in this context generation is something like this
while read p; do egrep -rH '^;|\(|^\(|\)$' "$p"; done <textual.txt > text_context_search.txt
This is specific regexing for my purpose for indexing contents of programs, this extracts lines that are whole comments or contains comments out of CNC program files.
resulting output is something like
file_path:regex_hit
now obviously most programs has more than one comment, so theres too much redundancy not only in repetition, but an exhaustive context index is about a gigabyte in size
I am now working towards script that would compact redudancy in such pattern
file_path_1:regex_hit_1
file_path_1:regex_hit_2
file_path_1:regex_hit_3
...
would become:
file_path_1:regex_hit1,regex_hit_2,regex_hit3
and if I succeed to do this in efficient manner its all ok.
The problem here is whether I'm doing this in a proper way. Maybe I should be using different tools to generate such context index in the first place ?
EDIT
After further copying and pasting from stack overflow and thinking about it I glued up solution using not my code, that nearly entirely solves my previously mentioned issue.
<?php
// https://stackoverflow.com/questions/26238299/merging-csv-lines-where-column-value-is-the-same
$rows = array_map('str_getcsv', file('text_context_search2.1.txt'));
//echo '<pre>';
print_r($csv);
//echo '</pre>';
// Array for output
$concatenated = array();
// Key to organize over
$sortKey = '0';
// Key to concatenate
$concatenateKey = '1';
// Separator string
$separator = ' ';
foreach($rows as $row) {
// Guard against invalid rows
if (!isset($row[$sortKey]) || !isset($row[$concatenateKey])) {
continue;
}
// Current identifier
$identifier = $row[$sortKey];
if (!isset($concatenated[$identifier])) {
// If no matching row has been found yet, create a new item in the
// concatenated output array
$concatenated[$identifier] = $row;
} else {
// An array has already been set, append the concatenate value
$concatenated[$identifier][$concatenateKey] .= $separator . $row[$concatenateKey];
}
}
// Do something useful with the output
//var_dump($concatenated);
//echo json_encode($concatenated)."\n";
$fp = fopen('exemplar.csv', 'w');
foreach ($concatenated as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
I'm trying to convert some XML files I have to CSV using PHP SimpleXML class. However, I'm unable to achieve the result I want, because one parent could have several child elements with the same name. My current XML file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<club>
<name>Green Riders</name>
<membership>Free</membership>
<boardMember>
<name>James F.</name>
<position>CEO</position>
</boardMember>
<boardMember>
<name>Helen D.</name>
<position>Associate Director</position>
</boardMember>
</club>
<club>
<name>Broken Dice</name>
<membership>Paid</membership>
<boardMember>
<name>Patrick B.</name>
<position>CEO</position>
</boardMember>
</club>
</root>
The CSV output I was hoping to achieve is as such:
club,name,membership,boardMember>Name,boardMember>position
Green Riders,Free,James F.,CEO
Green Riders,Free,Helen D., Associate Director
Broken Dice,Paid,Patrick B., CEO
Is there anyway to achieve this without hard-coding the element names into the script (i.e. make it work on any generic XML file)?
I'm really hoping this is possible, given that I'll be having more than 25 XML variants; so would really be inefficient to write a dedicated script for each.
Thanks!
Since every child node's data need to be a row in the csv including the root root data, First you can capture & store the root data, then traverse the children and print their data with the root's data preceding them.
Please check the following code:
$xml = simplexml_load_file("your_xml_file.xml") or die("Error: Cannot create object");
$csv_delimeter = ",";
$csv_new_line = "\n";
foreach($xml->children() as $n) {
$club_data = array();
$club_data[] = $n->name;
$club_data[] = $n->membership;
if (isset($n->boardMember)) {
foreach ($n->boardMember as $boardMember) {
$boardMember_data = $club_data;
$boardMember_data[] = $boardMember->name;
$boardMember_data[] = $boardMember->position;
echo implode($csv_delimeter, $boardMember_data).$csv_new_line;
}
}
else {
echo implode($csv_delimeter, $club_data).$csv_new_line;
}
}
After testing with the example xml data, it generated the following type of output:
Green Riders,Free,James F.,CEO
Green Riders,Free,Helen D., Associate Director
Broken Dice,Paid,Patrick B., CEO
You can set different values based on your scenario for:
$csv_delimeter = ",";
$csv_new_line = "\n";
As there are no strict rules in csv output - like delimeter can be ",", ",", ";" or "|" and also new line can be "\n\r"
The codes prints csv rows one-by-one on the fly, but if you are to save csv data in a file, then instead of writing rows one-by-one, better approach would be create the entire array and write it once(as disk access is costly) unless the xml data is large. You will get plenty of simple php array-to-csv function examples in the net.
It is not really possible. XML is a nested structure and you miss the information. You can define some default mapping for XML structures, but that gets really complex really fast. So it is far easier (and less time consuming) to define the mapping by hand.
A Reusable Conversion
function readXMLAsRecords(string $xml, array $map) {
// load the xml
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
// iterate the elements defining the rows
foreach ($xpath->evaluate($map['row']) as $row) {
$line = [];
// get the field values from the current $row
foreach ($map['columns'] as $name => $expression) {
$line[$name] = $xpath->evaluate($expression, $row);
}
// return a line
yield $line;
}
}
The Mapping
With DOMXpath::evaluate() Xpath expressions can return strings. So we need one expression that returns the boardMember nodes and a list of expressions for the fields.
$map = [
'row' => '/root/club/boardMember',
'columns' => [
'club_name' => 'string(parent::club/name)',
'club_membership' => 'string(parent::club/membership)',
'board_member_name' => 'string(name)',
'board_member_position' => 'string(position)'
]
];
To CSV
readXMLAsRecords() returns a generator, you can use foreach on it:
$csv = fopen('php://stdout', 'w');
fputcsv($csv, array_keys($map['columns']));
foreach (readXMLAsRecords($xml, $map) as $record) {
fputcsv($csv, $record);
}
Output:
club_name,club_membership,board_member_name,board_member_position
"Green Riders",Free,"James F.",CEO
"Green Riders",Free,"Helen D.","Associate Director"
"Broken Dice",Paid,"Patrick B.",CEO
I'm trying to import many xml files that I do not know the name.
I use this code:
foreach(glob('OLD/*.xml') as $file) {
$url= basename($file) . ', ';
$all_urls = array($url);
foreach ($all_urls as $url) {
$xml = simplexml_load_file($url);
I have a lot of files like agency.xml, annunci_324.xml, annunci_321.xml, ecc...
I only need the files that begin for annunci and end .xml. I also need to delete last value's comma and put it in the last foreach. how can i do it?
I think you can check if name contains annunci with strstr function (documentation here)
if(strstr($file, 'annunci')
{
//we found a file with name we are interessed in.
Now you can build directly your array without caring about commas
$all_urls = array();
foreach(glob('OLD/*.xml') as $file)
{
if(strstr($file, 'annunci')
{
$all_urls[] = array(basename($file));
}
}
This way we have all_urls as array of all the files starting with annunci and you can loop in it to simple_load them all.
Hello I have the following xml results that are returned from a remote site
<ResultSet totalResultsAvailable="1">
<Product orderNo="5321" partNo="A2345" truckable="1">
<Manufacturer id="22">WIDGET 4 U</Manufacturer>
<Model id="356">ACME 500</Model>
<Years>95-98</Years>
<ProductType id="23" categoryID="4">Cool Red Widgest</ProductType>
<Material id="6">shiny stuff</Material>
<PartNo>A2345</PartNo>
<Code/>
</Product>
</ResultSet>
I am simply trying to pull the xml results and place in a new csv file with the following code:
but I get and error: Warning:
Invalid argument supplied for foreach() in /home/myServer/public_html/xmlParser2.php on line 14
Here is my code:
<?
echo 'Write XML to CSV';
$basenameLong ='http://thisIsTheURLto.com/myFeed/?key=123456789&mode=getProducts;
$fileNameCSV = 'xmlParseContent.csv';
$feedContent = '';
echo '<br/>Starting......';
$feedContent = file_get_contents($basenameLong);
$fh = fopen($fileNameCSV, 'w+'); //create new CSV file if not exists else append
foreach($feedContent->ResultSet->Product as $product) {
fputcsv($f, get_object_vars($product),',','"');
}
fclose($fh);
?>
I know this code is very elementary but can you help me find the issue. I am a novice and I dont see it.
This line is wrong :
fputcsv($f, get_object_vars($product),',','"');
if you want to put blank values, try doing this :
fputcsv($f, get_object_vars($product),'','','');
Your problem is that you never parse your XML file. Replace file_get_contents with simplexml_load_file and it should work.
Using PHP to convert XML to CSV is fairly easy, at least in the situations I've encountered so far. In my case, it would save me significant work if I could simply convert structured XML data into CSV data. Typically, I want to convert only the data in a particular xpath of the original XML document. The PHP function below will load an XML file and convert the elements in the specified xpath to simple csv data.
function xml2csv ($xmlFile, $xPath) {
// Load the XML file
$xml = simplexml_load_file($xmlFile);
// Jump to the specified xpath
$path = $xml->xpath($xPath);
// Loop through the specified xpath
foreach($path as $item) {
// Loop through the elements in this xpath
foreach($item as $key => $value) {
$csvData .= '"' . trim($value) . '"' . ',';
}
// Trim off the extra comma
$csvData = trim($csvData, ',');
// Add an LF
$csvData .= "\n";
}
// Return the CSV data
return $csvData;
}
I have researched for an answer and mainly with the help of answers in this question Convert Tab delimited text file to XML, pieced together the following script to read a CSV file line by line and then convert the results to an XML file.
The CSV file has lines with three or more cells in this manner:
John Doe john_doe#email.com 06/07/2012 01:45
When ran in the Interactive PHP shell, the following script ignores the first line of the file and spits out everything, from two lines at at time, inside the first xml tag:
<?php
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
ini_set('auto_detect_line_endings', true);
$xmlWriter = new XMLWriter();
$xmlWriter->openUri('/path/to/destination.xml');
$xmlWriter->setIndent(true);
$xmlWriter->startDocument('1.0', 'UTF-8');
$xmlWriter->startElement('root');
$tsvFile = new SplFileObject('/path/to/destination.csv');
$tsvFile->setFlags(SplFileObject::READ_CSV);
$tsvFile->setCsvControl("\t");
foreach ($tsvFile as $line => $row) {
if($line > 0 && $line !== ' ') {
$xmlWriter->startElement('item');
$xmlWriter->writeElement('name', $row[0]);
$xmlWriter->writeElement('email', $row[1]);
$xmlWriter->writeElement('date', $row[2]);
$xmlWriter->endElement();
}
}
$xmlWriter->endElement();
$xmlWriter->endDocument(); ?>
To resolve this, I tried the solution here: tab-delimited string to XML with PHP
The following is the modified script:
<?php
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
ini_set('auto_detect_line_endings', true);
$xmlWriter = new XMLWriter();
$xmlWriter->openUri('/path/to/destination.xml');
$xmlWriter->setIndent(true);
$xmlWriter->startDocument('1.0', 'UTF-8');
$xmlWriter->startElement('root');
$tsvFile = new SplFileObject('/path/to/destination.csv');
$tsvFile->setFlags(SplFileObject::READ_CSV);
$tsvFile->setCsvControl("\t");
$lines = explode("\n", $tsvFile);
$tsvData = array();
foreach ($lines as $line ) {
if($line > 0 ) {
$tsvData[] = str_getcsv($line, "\t");
$tsvData[] = str_getcsv($line, "\t");
foreach ($tsvData as $row) {
$xmlWriter->writeElement('name', $row[0]);
$xmlWriter->writeElement('email', $row[1]);
$xmlWriter->writeElement('date', $row[2]);
$xmlWriter->endElement();
}
}
}
$xmlWriter->endElement();
$xmlWriter->endDocument();?>
This script creates the xml file but unfortunately produces no output inside of it.
Would someone be able to help me by pointing out where I am going wrong? I am no expert with this but trying my hardest to learn.
Your help is very much appreciated!
You seem to making hard work of this.
XML is (at the end of the day) can be expressed as an text file.
While not just either create a string or file.
Read in the CSV, split the columns. Write out the CSV as XML into that string with the appropriate tags. Then load that file or string into a DOM object.
Everywhere in your script that you have "\t", you are specifying a tab character (as you've modified something for TSV files). If you're trying to convert a CSV file (comma-separated list), as a first step, try replacing all instances of "\t" with ",".