Firstly I know this a rather long/detailed post if you are looking for the gist of my problem you can jump to the bottom where I have a TLDR. Thanks in advance to all commenters
I have been working on a feature for my clients website. They have an older version of Microsoft Excel on MAC which does not support .XML - the store system they have uses .XML
So I need to code the ability to convert CSV into XML, but the XML must conform to the structure required by the store component. I have already coded an XML to CSV function which does work.
This is the XML output by the store system (I have removed the values for security of my client's customers):
<orders>
<order>
<order_id>38</order_id>
<order_number>000015</order_number>
<order_status>Authorized</order_status>
<order_date>0000-00-00 00:00:00</order_date>
<customer_email>test#someemail.ca</customer_email>
<order_amount>order total</order_amount>
<base_order_amount>pre shipping order total</base_order_amount>
<shipping_type>Basic Shipping</shipping_type>
<shipping_price> $0.00</shipping_price>
<billing_first_name>Name</billing_first_name>
<billing_last_name>B</billing_last_name>
<billing_address1>PO / Add</billing_address1>
<billing_address2></billing_address2>
<billing_city>Town</billing_city>
<billing_state_province>province</billing_state_province>
<billing_country>Canada</billing_country>
<billing_postal_code>postal code</billing_postal_code>
<billing_phone></billing_phone>
<emt_quest>test</emt_quest>
<emt_answ>test</emt_answ>
<emt_answ_conf>test</emt_answ_conf>
<shipping_first_name>Name</shipping_first_name>
<shipping_last_name>B</shipping_last_name>
<shipping_address1>PO / Add</shipping_address1>
<shipping_address2></shipping_address2>
<shipping_city>Town</shipping_city>
<shipping_state_province>province</shipping_state_province>
<shipping_country>Canada</shipping_country>
<shipping_postal_code>postal code</shipping_postal_code>
<shipping_phone></shipping_phone>
<items>
<item>
<item_name>Sample Item</item_name>
<item_price>$8.00</item_price>
<item_quantity>12</item_quantity>
</item>
<item>
<item_name>Sample Item 2</item_name>
<item_price>$12.00</item_price>
<item_quantity>12</item_quantity>
</item>
</items>
</order>
This is the code of my XML to CSV function
<?php
function xml2csv($xmlFile, $xPath) {
$csvData = "";
// Load the XML file
$xml = simplexml_load_file($xmlFile);
// xpath to search
$path = $xml->order;
//get headers (xpath must match above)
$headers = get_object_vars($xml->order[0]);
// Loop through the first row to get headers
foreach($headers as $key => $value){
$csvData .= $key . ',';
}
// Trim off the extra comma
$csvData = trim($csvData, ',');
// Add an LF
$csvData .= "\n";
foreach($path as $item) {
// Loop through the elements in specificed xpath
foreach($item as $key => $value) {
//check for a second generation children of specified first generation child
if ($key == "items") {
$itemString = "";
// if first generation child has children then loop through each second gen child
foreach ($item->children() as $child) {
// loop through each xpath of second generation child
foreach($child as $value) {
// for value of each xpath of second generation child get value as out
foreach($value->children() as $out) {
//combine each value into itemString for export to .csv
$itemString .= $out . "|";
}
}
}
// place item string in csvData string and remove extra pipe
$csvData .= trim($itemString, "|");
}
//else put xpath values of first geneartion child in .csv
else {
$csvData .= trim($value) . ',';
}
}
// Trim off the extra comma
$csvData = trim($csvData, ',');
// Add an LF
$csvData .= "\n";
}
// Return the CSV data
return $csvData;
}
When called with a given .XML file from the store system it outputs the following .CSV file (I have used dummy values the 'item price' is not accidental)
order_id,order_number,order_status,order_date,customer_email,order_amount,base_order_amount,shipping_type,shipping_price,billing_first_name,billing_last_name,billing_address1,billing_address2,billing_city,billing_state_province,billing_country,billing_postal_code,billing_phone,emt_quest,emt_answ,emt_answ_conf,medicinal_use,shipping_first_name,shipping_last_name,shipping_address1,shipping_address2,shipping_city,shipping_state_province,shipping_country,shipping_postal_code,shipping_phone,items
00,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
01,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
02,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
03,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
04,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity|item name|item price|item quantity
The purpose here is that my client can download a .CSV directly from the store system (rather than its default .XML) - deal with it in excel as they need to process their orders, and then upload that .CSV back into the store - where it will automatically convert to XML formed like I have shown above.
Since .CSV is a flat format what I did was condense the items XML into a simple .CSV string where each value is delimited by a | which will not be used in any of our markup text on the site. As such item name|item price|item quantity
Here is my code which attempts to achieve this, I come close but I am having some wonky behaviour with the output. It throws an undefined offet error on the noted line $itemvalue = $doc->createTextNode($irow[$g]); (as if the loop is running too many times) and also does not produce the expected output.
function contains($substring, $string) {
$pos = strpos($string, $substring);
if($pos === false) {
// string needle NOT found in haystack
return false;
}
else {
// string needle found in haystack
return true;
}
}
function csv2xml($csvData) {
$outputFilename = 'test.xml';
// Open csv to read
$input = fopen($csvData, 'rt');
// Get the headers of the file
$headers = fgetcsv($input);
// Create a new dom document with pretty formatting
$doc = new DomDocument();
$doc->formatOutput = true;
// Add a root node to the document
$root = $doc->createElement('orders');
$root = $doc->appendChild($root);
while (($row = fgetcsv($input)) !== FALSE) {
$container = $doc->createElement('order');
foreach ($headers as $i => $header)
{
//set temp file name here
$tempFile = "temp.csv";
//prepare mockCSV
$mockCSV = "";
$mockCSV .= "item_name,item_price,item_quantity";
$mockCSV .= "\n";
//check if current property has items data with |
if (contains("|", $row[$i])) {
//if it does create array of data
$item_arr = explode("|", $row[$i]);
//create header for 'items' node
$child = $doc->createElement($header);
$child = $container->appendChild($child);
//count for items
$count = 0;
foreach($item_arr as $k => $item) {
$mockCSV .= trim($item) . ",";
if($count == 2) {
// Trim off the extra comma
$mockCSV = trim($mockCSV, ',');
// Add an LF
$mockCSV .= "\n";
}
$count++;
}
// Trim off the extra comma
$mockCSV = trim($mockCSV, ',');
// Add an LF
$mockCSV .= "\n";
//put mock CSV data in temp file
$f = fopen($tempFile, "w");
fwrite($f, $mockCSV);
fclose($f);
//get data from temp file
$iteminput = fopen($tempFile, 'rt');
//get headers from temp file
$itemheaders = fgetcsv($iteminput);
while (($irow = fgetcsv($iteminput)) !== FALSE) {
$itemchild = $doc->createElement('item');
foreach($itemheaders as $g => $itemheader) {
$subchild = $doc->createElement($itemheader);
$subchild = $itemchild->appendChild($subchild);
$itemvalue = $doc->createTextNode($irow[$g]); /* OFFSET HAPPENS HERE */
$itemvalue = $subchild->appendChild($itemvalue);
}
}
$itemchild = $child->appendChild($itemchild);
}
else {
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
}
$root->appendChild($container);
}
$strxml = $doc->saveXML();
$handle = fopen($outputFilename, "w");
fwrite($handle, $strxml);
fclose($handle);
}
echo csv2xml("test.csv");
?>
The expected output should be the same as the XML structure I posted above, but instead it is doing this:
<orders>
<order>
<order_id>38</order_id>
<order_number>000015</order_number>
<order_status>Authorized</order_status>
<order_date>0000-00-00 00:00:00</order_date>
<customer_email>test#someemail.ca</customer_email>
<order_amount>$96.00</order_amount>
<base_order_amount>$96.00</base_order_amount>
<shipping_type>Basic Shipping</shipping_type>
<shipping_price> $0.00</shipping_price>
<billing_first_name>Name</billing_first_name>
<billing_last_name>B</billing_last_name>
<billing_address1>PO / Add</billing_address1>
<billing_address2></billing_address2>
<billing_city>Town</billing_city>
<billing_state_province>province</billing_state_province>
<billing_country>Canada</billing_country>
<billing_postal_code>postal code</billing_postal_code>
<billing_phone></billing_phone>
<emt_quest>test</emt_quest>
<emt_answ>test</emt_answ>
<emt_answ_conf>test</emt_answ_conf>
<shipping_first_name>Name</shipping_first_name>
<shipping_last_name>B</shipping_last_name>
<shipping_address1>PO / Add</shipping_address1>
<shipping_address2></shipping_address2>
<shipping_city>Town</shipping_city>
<shipping_state_province>province</shipping_state_province>
<shipping_country>Canada</shipping_country>
<shipping_postal_code>postal code</shipping_postal_code>
<shipping_phone></shipping_phone>
<items>
<item>
<item_name></item_name>
<item_price></item_price>
<item_quantity></item_quantity>
</item>
</items>
</order>
And not putting the values in for some of the fields. Also it does not repeat for double product entries as shown whose source .CSV field looks like this item name|item price|item quantity|item name|item price|item quantity
This is my problem, I can't seem to handle the pipe delimited field properly it doesn't output as expected. In an earlier version of the code I got all the data, but it did not create separate 'item' nodes.
Any help is much appreciated, at this point I think its something simple and I just need another pair of eyes on the subject.
More to the point I am using very patchy code here I feel, I am out of practice with .PHP - I feel there must be some sort of logic problem with how I am going about this - my way can work but there must be a more streamlined method. If anyone could tell me what that is - that's the answer I'm really looking for.
TL:DR starts here
I am trying to convert .CSV data into structured .XML data using pipe delimiting for the second generation and third generation XML children
Only one field in my source .CSV file 'items' contains such information - all other items are single key single entry the data looks like this item name|item price|item quantity|item name|item price|item quantity
So what I do is check for | inside of the .CSV string which is currently being ran through the loop and if it is detected, I use explode() to create an array of what was in there.
I've tried recreating a mock CSV file and putting it in a temp directory to place this information in and then using basic CSV to XML which does work in my program to place that data into the XML Dom Document
Expected output:
<items>
<item>
<item_name>Sample Item</item_name>
<item_price>$8.00</item_price>
<item_quantity>12</item_quantity>
</item>
<item>
<item_name>Sample Item 2</item_name>
<item_price>$8.00</item_price>
<item_quantity>12</item_quantity>
</item>
</items>
Output I am getting:
<items>
<item>
<item_name></item_name>
<item_price></item_price>
<item_quantity></item_quantity>
</item>
</items>
A lot of info I need to get out there to properly illustrate the issue but my problem is simple - how can I achieve the output I want.
Let me backup and offer a routine for CSV to XML first, then take care of the piped elements.
Some comments:
I prefer SimpleXML over DOM for its ease of use, so I'll use it in the example. Of course, it can be done with DOM as well.
I'll make use of str_getcsv() instead of fgetcsv() to be able to create a working example online.
basic CSV to XML
// XML: set up object
$xml = simplexml_load_string("<orders/>");
// CSV: assume CSV in $c, get it as a whole
$csv = str_getcsv($c, "\n");
// CSV: separate 1st row with field names from the following rows
$names = str_getcsv(array_shift($csv));
// CSV: parse row by row
foreach ($csv as $row) {
// CSV: combine names as keys => data as values
$row = array_combine($names, str_getcsv($row));
// XML: create new <order>
$xml_order = $xml->addChild("order");
// CSV: parse a single row
foreach ($row as $key => $value) {
// *****
// XML: create field as child of <order>
$xml_order->addChild($key, $value);
// *****
}
}
handle piped elements
the following code replaces the lines between // ***** above
// CSV: check for pipes, attention use strict comparison ===
if (strpos($value, "|") === false) {
// XML: no pipe, create node as a child of <order>
$xml_order->addChild($key, $value);
} else {
// CSV: pipe present, split up data
$csv_items = str_getcsv($value,"|");
// XML: create <items> node
$xml_items = $xml_order->addChild($key);
// CSV: iterate over $csv_items, each 3 elements = 1 row
// chop row after row
while (!empty($csv_items)) {
// XML: create <item> node as child of <items>
$xml_item = $xml_items->addChild("item");
// XML: create children of <item> node
$xml_item->addChild("item_name", array_shift($csv_items));
$xml_item->addChild("item_price", array_shift($csv_items));
$xml_item->addChild("item_quantity", array_shift($csv_items));
}
}
combine code without comments
$xml = simplexml_load_string("<orders/>");
$csv = str_getcsv($c, "\n"); // assume CSV in $c
$names = str_getcsv(array_shift($csv));
foreach ($csv as $row) {
$row = array_combine($names, str_getcsv($row));
$xml_order = $xml->addChild("order");
foreach ($row as $key => $value) {
if (strpos($value, "|") === false)
$xml_order->addChild($key, $value);
else {
$csv_items = str_getcsv($value,"|");
$xml_items = $xml_order->addChild($key);
while (!empty($csv_items)) {
$xml_item = $xml_items->addChild("item");
$xml_item->addChild("item_name", array_shift($csv_items));
$xml_item->addChild("item_price", array_shift($csv_items));
$xml_item->addChild("item_quantity", array_shift($csv_items));
}
}
}
}
see it working: https://eval.in/368945
I am currently working on a project to scrape data from a website. I have scraped all the data I need with the following code, however I would like to know what is the easiest way to output this data into a comma-delimited CSV file. I had originally planned to move all of it into a table then export that way, but I know don't if that's the most efficient method.
<?php
$html = file_get_contents("http://www.zillow.com/homes/for_sale/Alamance-County-NC/2117_rid/36.181671,-78.943291,35.912411,-79.835243_rect/10_zm/1_fr/");
$DOM = new DOMDocument();
libxml_use_internal_errors(true);
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'property-address';
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$csv_values = array();
foreach ($nodes as $node) {
$csv_values[] = $node->nodeValue;
}
$handle = fopen("C:\Users\Stephen\Documents\WorkCSV\work.csv", "w");
if (false !== $handle) {
fputcsv($handle, $csv_values);
}
?>
I was able to get all of my data into an array using the code provided by Dave. Also, in fopen I was using backslashes "\" , and after switching to forward slashes "/" I was able to produce an error I can work with for exporting to CSV.
You can use fputcsv and iterator_to_array:
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$handle = fopen("/path/to/file.csv", "w");
if (false !== $handle) {
fputcsv($handle, iterator_to_array($nodes));
}
Or like this:
$csv_values = array();
foreach ($nodes as $node) {
$csv_values[] = $node->nodeValue;
}
$handle = fopen("/path/to/file.csv", "w");
if (false !== $handle) {
fputcsv($handle, $csv_values);
}
On the Windows platform, be careful to escape any backslashes used in the path to the file, or use forward slashes.
$handle = fopen("c:\\folder\\file.csv", "r");
Bassicly what I want to do is using PHP open a xml file and edit it using php now this I can do using fopen() function.
Yet my issue it that i want to append text to the middle of the document. So lets say the xml file has 10 lines and I want to append something before the last line (10) so now it will be 11 lines. Is this possible. Thanks
Depending on how large that file is, you might do:
$lines = array();
$fp = fopen('file.xml','r');
while (!feof($fp))
$lines[] = trim(fgets($fp));
fclose($fp);
array_splice($lines, 9, 0, array('newline1','newline2',...));
$new_content = implode("\n", $lines);
Still, you'll need to revalidate XML-syntax afterwards...
If you want to be able to modify a file from the middle, use the c+ open mode:
$fp = fopen('test.txt', 'c+');
for ($i=0;$i<5;$i++) {
fgets($fp);
}
fwrite($fp, "foo\n");
fclose($fp);
The above will write "foo" on the fifth line, without having to read the file entirely.
However, if you are modifying a XML document, it's probably better to use a DOM parser:
$dom = new DOMDocument;
$dom->load('myfile.xml');
$linenum = 5;
$newNode = $dom->createElement('hello', 'world');
$element = $dom->firstChild->firstChild; // skips the root node
while ($element) {
if ($element->getLineNo() == $linenum) {
$element->parentNode->insertBefore($newNode, $element);
break;
}
$element = $element->nextSibling;
}
echo $dom->saveXML();
Of course, the above code depends on the actual XML document structure. But, the $element->getLineNo() is the key here.