I have researched for an answer and mainly with the help of answers in this question Convert Tab delimited text file to XML, pieced together the following script to read a CSV file line by line and then convert the results to an XML file.
The CSV file has lines with three or more cells in this manner:
John Doe john_doe#email.com 06/07/2012 01:45
When ran in the Interactive PHP shell, the following script ignores the first line of the file and spits out everything, from two lines at at time, inside the first xml tag:
<?php
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
ini_set('auto_detect_line_endings', true);
$xmlWriter = new XMLWriter();
$xmlWriter->openUri('/path/to/destination.xml');
$xmlWriter->setIndent(true);
$xmlWriter->startDocument('1.0', 'UTF-8');
$xmlWriter->startElement('root');
$tsvFile = new SplFileObject('/path/to/destination.csv');
$tsvFile->setFlags(SplFileObject::READ_CSV);
$tsvFile->setCsvControl("\t");
foreach ($tsvFile as $line => $row) {
if($line > 0 && $line !== ' ') {
$xmlWriter->startElement('item');
$xmlWriter->writeElement('name', $row[0]);
$xmlWriter->writeElement('email', $row[1]);
$xmlWriter->writeElement('date', $row[2]);
$xmlWriter->endElement();
}
}
$xmlWriter->endElement();
$xmlWriter->endDocument(); ?>
To resolve this, I tried the solution here: tab-delimited string to XML with PHP
The following is the modified script:
<?php
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
ini_set('auto_detect_line_endings', true);
$xmlWriter = new XMLWriter();
$xmlWriter->openUri('/path/to/destination.xml');
$xmlWriter->setIndent(true);
$xmlWriter->startDocument('1.0', 'UTF-8');
$xmlWriter->startElement('root');
$tsvFile = new SplFileObject('/path/to/destination.csv');
$tsvFile->setFlags(SplFileObject::READ_CSV);
$tsvFile->setCsvControl("\t");
$lines = explode("\n", $tsvFile);
$tsvData = array();
foreach ($lines as $line ) {
if($line > 0 ) {
$tsvData[] = str_getcsv($line, "\t");
$tsvData[] = str_getcsv($line, "\t");
foreach ($tsvData as $row) {
$xmlWriter->writeElement('name', $row[0]);
$xmlWriter->writeElement('email', $row[1]);
$xmlWriter->writeElement('date', $row[2]);
$xmlWriter->endElement();
}
}
}
$xmlWriter->endElement();
$xmlWriter->endDocument();?>
This script creates the xml file but unfortunately produces no output inside of it.
Would someone be able to help me by pointing out where I am going wrong? I am no expert with this but trying my hardest to learn.
Your help is very much appreciated!
You seem to making hard work of this.
XML is (at the end of the day) can be expressed as an text file.
While not just either create a string or file.
Read in the CSV, split the columns. Write out the CSV as XML into that string with the appropriate tags. Then load that file or string into a DOM object.
Everywhere in your script that you have "\t", you are specifying a tab character (as you've modified something for TSV files). If you're trying to convert a CSV file (comma-separated list), as a first step, try replacing all instances of "\t" with ",".
Related
I'm coding a plugin that runs everyday at 5am. It combines multiple csv files (That have a txt extension).
Currently, it is working... HOWEVER, the output format is incorrect.
The input will look like this:
"","","","","email#gmail.com","PARK PLACE 109 AVE","SOME RANDOM DATA","","","",""
And so on. this is only a partial row.
The ouput of this code does not retun the same format. It produces something like this without the " in columns without data
,,,,email#gmail.com,"PARK PLACE 109 AVE","SOME RANDOM DATA",,,,
Here is the part of the function that combines everything:
function combine_and_email_csv_files() {
// Get the current time and date
$now = new DateTime();
$date_string = $now->format('Y-m-d_H-i-s');
// Get the specified directories
$source_directory = get_option('csv_file_combiner_source_directory');
$destination_directory = get_option('csv_file_combiner_destination_directory');
// Load the CSV files from the source directory
$csv_files = glob("$source_directory/*.txt");
// Create an empty array to store the combined CSV data
$combined_csv_data = array();
// Loop through the CSV files
foreach ($csv_files as $file) {
// Load the CSV data from the file
$csv_data = array_map('str_getcsv', file($file));
// Add the CSV data to the combined CSV data array
$combined_csv_data = array_merge($combined_csv_data, $csv_data);
}
// Create the combined CSV file
$combined_csv_file = fopen("$destination_directory/$date_string.txt", 'w');
// Write the combined CSV data to the file
foreach ($combined_csv_data as $line) {
fputcsv($combined_csv_file, $line);
}
// Close the combined CSV file
fclose($combined_csv_file);
}
No matter, what I've tried... it's not working. I'm missing something simple I know.
Thank you Nigel!
So this thread, Forcing fputcsv to Use Enclosure For *all* Fields helped me get there....
Using fputs instead of fputscsv and force "" on null values is the short answer for me. Works beautifully... code is below:
function combine_and_email_csv_files() {
// Get the current time and date
$now = new DateTime();
$date_string = $now->format('Y-m-d_H-i-s');
// Get the specified directories
$source_directory = get_option('csv_file_combiner_source_directory');
$destination_directory = get_option('csv_file_combiner_destination_directory');
// Load the CSV files from the source directory
$csv_files = glob("$source_directory/*.txt");
// Create an empty array to store the combined CSV data
$combined_csv_data = array();
// Loop through the CSV files
foreach ($csv_files as $file) {
// Load the CSV data from the file
$csv_data = array_map('str_getcsv', file($file));
// Add the CSV data to the combined CSV data array
$combined_csv_data = array_merge($combined_csv_data, $csv_data);
}
// Create the combined CSV file
$combined_csv_file = fopen("$destination_directory/$date_string.txt", 'w');
// Write the combined CSV data to the file
foreach ($combined_csv_data as $line) {
// Enclose each value in double quotes
$line = array_map(function($val) {
if (empty($val)) {
return "\"\"";
}
return "\"$val\"";
}, $line);
// Convert the line array to a CSV formatted string
$line_string = implode(',', $line) . "\n";
// Write the string to the file
fputs($combined_csv_file, $line_string);
}
Thank you Sammitch
After much haggling with this problem... Sammitch pointed out why not just concat the files... Simplicity is the ultimate sophistication... right?
*Note: this will only work for my specific circumstance. All I'm doing now is concating the files and checking each file ends with a new line and just plain skipping the csv manipulation.
Code below:
function combine_and_email_csv_files() {
// Get the current time and date
$now = new DateTime();
$date_string = $now->format('Y-m-d_H-i-s');
// Get the specified directories
$source_directory = get_option('csv_file_combiner_source_directory');
$destination_directory = get_option('csv_file_combiner_destination_directory');
// Load the files from the source directory
$files = glob("$source_directory/*.txt");
// Create the combined file
$combined_file = fopen("$destination_directory/$date_string.txt", 'w');
// Loop through the files
foreach ($files as $file) {
// Read the contents of the file
$contents = file_get_contents($file);
// Ensure that the file ends with a newline character
if (substr($contents, -1) != "\n") {
$contents .= "\n";
}
// Write the contents of the file to the combined file
fwrite($combined_file, $contents);
}
// Close the combined file
fclose($combined_file);
I have json file which contains multiple json objects.
Example
{"t":"abc-1","d":"2017-12-29 12:42:53"}
{"t":"abc-2","d":"2017-12-29 12:43:05"}
{"t":"abc-3","d":"2017-12-30 14:42:09"}
{"t":"code-4","d":"2017-12-30 14:42:20"}
Want to read this file and store into database, but I couldn't convert json to php array which further I can store into database.
I tried json_decode function, but its not working. I search for this but in every link its showing use json_decode. Below is my code
$filename = "folder/filename.json";
$data = file_get_contents($filename);
echo $data;
$tags = json_decode($data, true);
echo"<pre>";print_r($tags);exit;
$data is echoed but not the $tags.
Thanks in advance.
Make array of objects and use it later
$j = array_map('json_decode', file('php://stdin'));
print_r($j);
demo
If it's only four lines you can explode and json_decode each line and add it to an array.
$s = '{"t":"abc-1","d":"2017-12-29 12:42:53"}
{"t":"abc-2","d":"2017-12-29 12:43:05"}
{"t":"abc-3","d":"2017-12-30 14:42:09"}
{"t":"code-4","d":"2017-12-30 14:42:20"}';
$arr = explode(PHP_EOL, $s);
Foreach($arr as $line){
$json[] = json_decode($line,true);
}
Var_dump($json);
https://3v4l.org/97m0E
Multiple objects in a row should be enclosed in a json array and separated with comma like elements.So you need a [ ] at the start and end of the file.Also you could close the pre tag
Either you should fix the file generating that 'json' or you can use fgets to get one line at a time, and use json decode on every line
As pointed by other, JSON which you shared isn't valid. And, I think, it is stored in your file in same fashion. I would suggest to read this file line by line each line then you can decode.
$handle = fopen("folder/filename.json", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
$tags = json_decode($line, true);
echo"<pre>";print_r($tags);exit;
}
fclose($handle);
} else {
// error opening the file.
}
Assuming a file called `filename.json` contains the following lines
{"t":"abc-1","d":"2017-12-29 12:42:53"}
{"t":"abc-2","d":"2017-12-29 12:43:05"}
{"t":"abc-3","d":"2017-12-30 14:42:09"}
{"t":"code-4","d":"2017-12-30 14:42:20"}
So each one is a separate json entity
$filename = "folder/filename.json";
$lines=file( $filename );
foreach( $lines as $line ){
$obj=json_decode( $line );
$t=$obj->t;
$d=$obj->d;
/* do something with constituent pieces */
echo $d,$t,'<br />';
}
Your JSON is invalid, as it has multiple root elements
Fixing it like the following should work (note the [, ] and commas):
[
{"t":"abc-1","d":"2017-12-29 12:42:53"},
{"t":"abc-2","d":"2017-12-29 12:43:05"},
{"t":"abc-3","d":"2017-12-30 14:42:09"},
{"t":"code-4","d":"2017-12-30 14:42:20"}
]
If you cannot influence how the JSON file is created, you will need to create your own reader, as PHP is not built to support invalid formatting. You could separate the file by new lines and parse each one individually.
We have written the following PHP script to convert CSV file to XML file. But It got stuck and didn't come out of the while loop to saveXML.
The size of the CSV file is around 1GB, The number of rows in the CSV file is around 1,00,000.
Due to the large number of rows, It is not working.
My question is: How can we modify this following code in such a way that, It works for a large file ?
<?php
$delimit = "," ;
$row_count = 0 ;
$inputFilename = "feed.csv" ;
$outputFilename = 'output.xml';
$inputFile = fopen($inputFilename, 'rt');
$headers = fgetcsv($inputFile);
$doc = new DomDocument();
$doc->formatOutput = true;
$root = $doc->createElement('rows');
$root = $doc->appendChild($root);
while (($row = fgetcsv($inputFile)) !== FALSE)
{
$container = $doc->createElement('row');
foreach ($headers as $i => $header)
{
$arr = explode($delimit, $header);
foreach ($arr as $j => $ar)
{
$child = $doc->createElement(preg_replace("/[^A-Za-z0-9]/","",$ar));
$child = $container->appendChild($child);
$whole = explode($delimit, $row[$i]);
$value = $doc->createTextNode(ltrim( rtrim($whole[$j], '"') ,'"'));
$value = $child->appendChild($value);
}
}
$root->appendChild($container);
echo "." ;
}
echo "Saving the XML now" ;
$result = $doc->saveXML();
echo "Writing to XML file now" ;
$handle = fopen($outputFilename, "w");
fwrite($handle, $result);
fclose($handle);
return $outputFilename;
?>
Edited:
In php.ini the memory_limit and execution time is set for unlimited & maximum. I am executing using command line.
as you noticed, you run into resource problems with such big in/output.
The input handling you use, fgetcsv() is already quite effective as it reads one line at a time.
The output is the problem in this case. You store the whole 1GB raw text into a DOMDocument Object, which adds considerable overhead to the needed memory.
But according to your code, you only write the xml back to a file, so you don't really need it as a DOMDocument at runtime.
The simplest solution would be to build the xml string as a string and write it to the output file for each line of the csv: open the handle for the outputfile with 'a' (fopen($outputfilename, "a");, write the xml header before the loop, fwrite every csv-to-xml-ified elment per loop run, write the xml footer after the loop
It's most probably the (mis)usage of the DomDocument that causes your memory issues (as already answered by #cypherabe).
But instead of the proposed string concatenation solution, I would urge you to take a look at the XmlWriter http://php.net/manual/en/book.xmlwriter.php
The XmlWriter extension represents a writer that provides a non-cached, forward-only means of generating streams or files containing XML data.
This extension can be used in an object oriented style or a procedural one.
It's already bundled with PHP from version 5.2.1
http://www.prestatraining.com/12-tips-to-optimise-your-php-ini-file-for-prestashop/
Look at the section Memory and Size Limits (ignore the fact it's about prestashop)
It sounds like your PHP settings on the server are timing out on execution. If you are trying to process a file that is 1GB I wouldn't be surprised if it fails if you have standard PHP.ini settings.
I have some xml files, which have the same elements but only with different information.
First file test.xml
<?xml version="1.0" encoding="UTF-8"?>
<phones>
<phone>
<title>"Apple iPhone 5S"</title>
<price>
<regularprice>500</regularprice>
<saleprice>480</saleprice>
</price>
<color>black</color>
</phone>
</phones>
Second file test1.xml
<?xml version="1.0" encoding="UTF-8"?>
<phones>
<phone>
<title>Nokia Lumia 830</title>
<price>
<regularprice>400</regularprice>
<saleprice>370</saleprice>
</price>
<color>black</color>
</phone>
</phones>
I need to convert some values from these xml files into 1 test.csv file
So I am using this php code
<?php
$filexml1='test.xml';
$filexml2='test1.xml';
//File 1
if (file_exists($filexml1)) {
$xml = simplexml_load_file($filexml1);
$f = fopen('test.csv', 'w');
$headers = array('title', 'color');
$converted_array = array_map("strtoupper", $headers);
fputcsv($f, $converted_array, ',', '"');
foreach ($xml->phone as $phone) {
//$phone->title = trim($phone->title, " ");
// Array of just the components you need...
$values = array(
"title" => (string)$phone->title = trim(str_replace ( "\"", """, $phone->title ), " "),
"color" => (string)$phone->color
);
fputcsv($f, $values,',','"');
}
fclose($f);
echo "<p>File 1 coverted to .csv sucessfully</p>";
} else {
exit('Failed to open test.xml.');
}
//File 2
if (file_exists($filexml2)) {
$xml = simplexml_load_file($filexml2);
$f = fopen('test.csv', 'a');
//the same code for second file like for the first file
echo "<p>File 2 coverted to .csv sucessfully</p>";
} else {
exit('Failed to open test1.xml.');
}
?>
The output of the test.csv looks this way
TITLE COLOR
Apple iPhone 5S black
Nokia Lumia 830 black
As you can see I only managed to load each file into a variable and for each file I have to write if statement which makes the script too big, so I am wondering if it is possible to load all files into array, process them with one code block because xml elements are the same and output to one .csv file? Essentially I need the same test.csv output only with less php code.
Thanks in advance.
Next to using an array, there is more in PHP which can make it even more simple. Like an array could represent a list of your files, other constructs in PHP can that, too.
For example, as the XML files you have most likely are inside a specific directory and follow some pattern with their filename, those could be easily represented with a GlobIterator:
$inputFiles = new GlobIterator(__DIR__ . '/*.xml');
You could then foreach over them which I'll show in a moment with another example.
Such a list allows you to streamline your processing. That is important because there is some kind of a generic formular for many programs: Input, Process, Output. This is also called IPO or IPO+S Model. The S stands for storing. In your case while you process the input data, you also store into a new file CSV file which is also the output (after processing is fully done).
When you follow such a generic model, it's easier to structure your code and with a better structure you most often have less code. Even if not, each part of your code is more self-contained and smaller which is most often what you're looking for.
Next to the said list of XML-files I showed at the beginning of the answer with the GlobIterator there are other Iterators that can help to process the XML data.
For example, you've got 1-n XML files that contain 0-n <phone> elements. You know that you want to process any of these <phone> elements, you already exactly know what you want to do with them (extract some data from it). So wouldn't it be great to have a list of all <phone> elements within all XML-files first?
This can be easily done in PHP with the help of a Generator. That is a function that can return values multiple times while it's still "running". This is a simplification, better show some code to illustrate that. Let's say we've got the list of XML files as input and we want all <phone> elements out of it. For sure, you could create an array of all these <phone> elements and process that array later. However, a Generator is able to offer all these <phone> elements directly to be used within a foreach loop:
function extract_phones(Traversable $files) {
foreach ($files as $file) {
$xml = simplexml_load_file($file);
if ($xml === false) {
continue;
}
foreach ($xml->phone as $phone) {
yield $phone;
}
}
}
As this exemplary Generator function shows, it goes over all $files, tries to load them as a SimpleXMLElement and if successfull, iterates over all <phone> elements and yields them.
That means, if the function extract_phones is called within a foreach, that loop will have every <phone> element as SimpleXMLElement:
foreach(extract_phones($inputFiles) as $phone) {
# $phone is a SimpleXMLElement here
}
So now your question asks about creating the CSV file as output. This could be done creating an SplFileObject to pass the output around and access it while processing. It basically works the same like passing the file-handle around like you do in your question but it has better semantics that do allow to change the code more easily later on (you could replace it with another object that behaves the same).
Additionally I've seen a little detail in your code that is worth for some discussion first. You're encoding the quotes as HTML entities:
trim(str_replace( "\"", """, $phone->title ), " ")
You most likely do that because you want to have HTML-Entities inside the CSV file. However, the CSV file does not need such. You also want to have the data in the CSV file as generic as possible. Whether the CSV file is used inside a HTML context later on or within a spreadsheet application should not be your concern when you convert the file-format. My suggestion is here to leave that out and deal at another place with it. A place this more belongs to, and that is later on, e.g. if you use the data from the CSV creating some HTML.
That keeps your conversion and the data clean and it also removes detailed places in your processing which not only make the code more complicate but are very often a place where we introduce flaws into our programs.
I for myself will just remove it from my example.
So let's put this all together: Get all phones from all XML files and store the fields interested in into the output CSV file:
$files = new GlobIterator(__DIR__ . '/*.xml');
$phones = extract_phones($files);
$output = new SplFileObject('file.csv', 'w');
$output->fputcsv($header = ["title", "color"]);
foreach ($phones as $phone) {
$output->fputcsv(
[
$phone->title,
$phone->color,
]
);
}
This then creates the output file you're looking for (without the HTML-entities):
title,color
"""Apple iPhone 5S""",black
"Nokia Lumia 830",black
All this needs is the generator-function I've showed above already that in itself has also straight-forward code. Everything else ships with PHP already. Here is the example code in full:
<?php
/**
* #link http://stackoverflow.com/questions/26074850/convert-multiple-xml-files-to-csv-with-simplexml
*/
function extract_phones(Traversable $files)
{
foreach ($files as $file) {
$xml = simplexml_load_file($file);
if ($xml === false) {
continue;
}
foreach ($xml->phone as $phone) {
yield $phone;
}
}
}
$files = new GlobIterator(__DIR__ . '/*.xml');
$phones = extract_phones($files);
$output = new SplFileObject('file.csv', 'w');
$output->fputcsv($header = ["title", "color"]);
foreach ($phones as $phone) {
$output->fputcsv(
[
$phone->title,
$phone->color,
]
);
}
echo file_get_contents($output->getFilename());
Thanks #Ghost for pointing me to the right direction. So here is my solution.
<?php
$filexml = array ('test.xml', 'test1.xml');
//Headers
$fp = fopen('file.csv', 'w');
$headers = array('title', 'color');
$converted_array = array_map("strtoupper", $headers);
fputcsv($fp, $converted_array, ',', '"');
//XML
foreach ($filexml as $file) {
if (file_exists($file)) {
$xml = simplexml_load_file($file);
foreach ($xml->phone as $phone) {
$values = array(
"title" => (string)$phone->title = trim(str_replace ( "\"", """, $phone->title ), " "),
"color" => (string)$phone->color
);
fputcsv($fp, $values, ',', '"');
}
echo $file . ' converted to .csv sucessfully' . '<br>';
} else {
echo $file . ' was not found' . '<br>';
}
}
fclose($fp);
?>
Hello I have the following xml results that are returned from a remote site
<ResultSet totalResultsAvailable="1">
<Product orderNo="5321" partNo="A2345" truckable="1">
<Manufacturer id="22">WIDGET 4 U</Manufacturer>
<Model id="356">ACME 500</Model>
<Years>95-98</Years>
<ProductType id="23" categoryID="4">Cool Red Widgest</ProductType>
<Material id="6">shiny stuff</Material>
<PartNo>A2345</PartNo>
<Code/>
</Product>
</ResultSet>
I am simply trying to pull the xml results and place in a new csv file with the following code:
but I get and error: Warning:
Invalid argument supplied for foreach() in /home/myServer/public_html/xmlParser2.php on line 14
Here is my code:
<?
echo 'Write XML to CSV';
$basenameLong ='http://thisIsTheURLto.com/myFeed/?key=123456789&mode=getProducts;
$fileNameCSV = 'xmlParseContent.csv';
$feedContent = '';
echo '<br/>Starting......';
$feedContent = file_get_contents($basenameLong);
$fh = fopen($fileNameCSV, 'w+'); //create new CSV file if not exists else append
foreach($feedContent->ResultSet->Product as $product) {
fputcsv($f, get_object_vars($product),',','"');
}
fclose($fh);
?>
I know this code is very elementary but can you help me find the issue. I am a novice and I dont see it.
This line is wrong :
fputcsv($f, get_object_vars($product),',','"');
if you want to put blank values, try doing this :
fputcsv($f, get_object_vars($product),'','','');
Your problem is that you never parse your XML file. Replace file_get_contents with simplexml_load_file and it should work.
Using PHP to convert XML to CSV is fairly easy, at least in the situations I've encountered so far. In my case, it would save me significant work if I could simply convert structured XML data into CSV data. Typically, I want to convert only the data in a particular xpath of the original XML document. The PHP function below will load an XML file and convert the elements in the specified xpath to simple csv data.
function xml2csv ($xmlFile, $xPath) {
// Load the XML file
$xml = simplexml_load_file($xmlFile);
// Jump to the specified xpath
$path = $xml->xpath($xPath);
// Loop through the specified xpath
foreach($path as $item) {
// Loop through the elements in this xpath
foreach($item as $key => $value) {
$csvData .= '"' . trim($value) . '"' . ',';
}
// Trim off the extra comma
$csvData = trim($csvData, ',');
// Add an LF
$csvData .= "\n";
}
// Return the CSV data
return $csvData;
}