Large XML file Parsing Chunk data Filetering in PHP - php

I have a large XML file more than 100 MB. I am reading the file in chunks like this
$fp = fopen('large.xml', 'r');
while ($data = fread($fp, 4096)) {
The format of XML is like this
<PersonalInfo>
<UserDetail>
<FirstName>ABC</FirstName>
<Occupation>Student</Occupation>
<DateOfBirth>08/14/1999</DateOfBirth>
</UserDetail>
<CaseDetail>....</CaseDetail>
<TransactionDetail>....</TransactionDetail>
</PersonalInfo>
<PersonalInfo>
<UserDetail>
<FirstName>XYZ</FirstName>
<Occupation>Student</Occupation>
<DateOfBirth>04/25/1991</DateOfBirth>
</UserDetail>
<CaseDetail>....</CaseDetail>
<TransactionDetail>.....</TransactionDetail>
</PersonalInfo>
<PersonalInfo>
<UserDetail>
<FirstName>DEF</FirstName>
<Occupation>Teacher</Occupation>
<DateOfBirth>05/12/1984</DateOfBirth>
</UserDetail>
<CaseDetail>....</CaseDetail>
<TransactionDetail>...</TransactionDetail>
</PersonalInfo>
I want to just include those records where the Occupation TAG is "Student" and write those results to a CSV file.
I have tried the preg_match as
preg_match( "/\(.*?)\</PersonalInfo>/s", $data, $match );
to select the Tags and then look into $match but it is returning double values(repetition).

First check if your xml is valid with the help of following link :
http://www.xmlformatter.net/
If your xml is valid then do following :
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->formatOutput = true;
#$dom->load('large.xml');
$tags = $dom->getElementsByTagName('PersonalInfo');
foreach ($tags as $destination) {
foreach($destination->childNodes as $child) {
if ($child->textContent == "Student") {
echo "Write code to create csv file";
}
}
}

Related

PHP Replace all values in the third column in a csv file

I am trying to replace some hyperlinks in a csv file, like this one:
[https://assets.suredone.com/683987/media-pics/6164307j-gabriel-61643-proguard-steel-shock-absorber-for-select-chevrolet-gmc-models.jpg. Here is my code:][1]. Here is my code:
<?php
$in_file = 'gabriel-images-urls.csv';
$out_file = 'results.csv';
$fd = fopen($in_file, "r");
$new_array= array();
$toBoot= array();
while ($data = fgetcsv($fd)) {
echo '<pre>';
if (strpos($data[2],'media-pics') !== false) {
$data[2]=str_replace('media-pics','media-photos',$data[2]);
fputcsv($fd, $data);
// echo $output;
}
}
?>
The new link for example must look like this:[1]https://assets.suredone.com/683987/media-photos/6164307j-gabriel-61643-proguard-steel-shock-absorber-for-select-chevrolet-gmc-models.jpg. The goal is he "media-pics" substring to be replaced with "media-photos". At this point nothing happens in the file. I think this is because the file is open only for reading but I am not sure.
Can you not simply do a string replacement on the whole file rather than attempting to load and process each line of the file using fgetcsv?
<?php
$srcfile='gabriel-images-urls.csv';
$outfile='results.csv';
$csvdata=file_get_contents( $srcfile );
$moddata=str_replace('media-pics','media-photos',$csvdata);
file_put_contents( $outfile, $moddata );
?>

PHP CSV to XML how to deal with pipe delimited strings

Firstly I know this a rather long/detailed post if you are looking for the gist of my problem you can jump to the bottom where I have a TLDR. Thanks in advance to all commenters
I have been working on a feature for my clients website. They have an older version of Microsoft Excel on MAC which does not support .XML - the store system they have uses .XML
So I need to code the ability to convert CSV into XML, but the XML must conform to the structure required by the store component. I have already coded an XML to CSV function which does work.
This is the XML output by the store system (I have removed the values for security of my client's customers):
<orders>
<order>
<order_id>38</order_id>
<order_number>000015</order_number>
<order_status>Authorized</order_status>
<order_date>0000-00-00 00:00:00</order_date>
<customer_email>test#someemail.ca</customer_email>
<order_amount>order total</order_amount>
<base_order_amount>pre shipping order total</base_order_amount>
<shipping_type>Basic Shipping</shipping_type>
<shipping_price> $0.00</shipping_price>
<billing_first_name>Name</billing_first_name>
<billing_last_name>B</billing_last_name>
<billing_address1>PO / Add</billing_address1>
<billing_address2></billing_address2>
<billing_city>Town</billing_city>
<billing_state_province>province</billing_state_province>
<billing_country>Canada</billing_country>
<billing_postal_code>postal code</billing_postal_code>
<billing_phone></billing_phone>
<emt_quest>test</emt_quest>
<emt_answ>test</emt_answ>
<emt_answ_conf>test</emt_answ_conf>
<shipping_first_name>Name</shipping_first_name>
<shipping_last_name>B</shipping_last_name>
<shipping_address1>PO / Add</shipping_address1>
<shipping_address2></shipping_address2>
<shipping_city>Town</shipping_city>
<shipping_state_province>province</shipping_state_province>
<shipping_country>Canada</shipping_country>
<shipping_postal_code>postal code</shipping_postal_code>
<shipping_phone></shipping_phone>
<items>
<item>
<item_name>Sample Item</item_name>
<item_price>$8.00</item_price>
<item_quantity>12</item_quantity>
</item>
<item>
<item_name>Sample Item 2</item_name>
<item_price>$12.00</item_price>
<item_quantity>12</item_quantity>
</item>
</items>
</order>
This is the code of my XML to CSV function
<?php
function xml2csv($xmlFile, $xPath) {
$csvData = "";
// Load the XML file
$xml = simplexml_load_file($xmlFile);
// xpath to search
$path = $xml->order;
//get headers (xpath must match above)
$headers = get_object_vars($xml->order[0]);
// Loop through the first row to get headers
foreach($headers as $key => $value){
$csvData .= $key . ',';
}
// Trim off the extra comma
$csvData = trim($csvData, ',');
// Add an LF
$csvData .= "\n";
foreach($path as $item) {
// Loop through the elements in specificed xpath
foreach($item as $key => $value) {
//check for a second generation children of specified first generation child
if ($key == "items") {
$itemString = "";
// if first generation child has children then loop through each second gen child
foreach ($item->children() as $child) {
// loop through each xpath of second generation child
foreach($child as $value) {
// for value of each xpath of second generation child get value as out
foreach($value->children() as $out) {
//combine each value into itemString for export to .csv
$itemString .= $out . "|";
}
}
}
// place item string in csvData string and remove extra pipe
$csvData .= trim($itemString, "|");
}
//else put xpath values of first geneartion child in .csv
else {
$csvData .= trim($value) . ',';
}
}
// Trim off the extra comma
$csvData = trim($csvData, ',');
// Add an LF
$csvData .= "\n";
}
// Return the CSV data
return $csvData;
}
When called with a given .XML file from the store system it outputs the following .CSV file (I have used dummy values the 'item price' is not accidental)
order_id,order_number,order_status,order_date,customer_email,order_amount,base_order_amount,shipping_type,shipping_price,billing_first_name,billing_last_name,billing_address1,billing_address2,billing_city,billing_state_province,billing_country,billing_postal_code,billing_phone,emt_quest,emt_answ,emt_answ_conf,medicinal_use,shipping_first_name,shipping_last_name,shipping_address1,shipping_address2,shipping_city,shipping_state_province,shipping_country,shipping_postal_code,shipping_phone,items
00,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
01,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
02,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
03,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity
04,000000,Authorized,0000-00-00 00:00:00,i#me.ca,$00.00,$00.00,Basic Shipping,$0.00,Me,Initial,123 Some Person Street,,Personville,Prov/State,Country,postal,,test,test,test,test,test,test,test,,test,test,test,test,,item name|item price|item quantity|item name|item price|item quantity
The purpose here is that my client can download a .CSV directly from the store system (rather than its default .XML) - deal with it in excel as they need to process their orders, and then upload that .CSV back into the store - where it will automatically convert to XML formed like I have shown above.
Since .CSV is a flat format what I did was condense the items XML into a simple .CSV string where each value is delimited by a | which will not be used in any of our markup text on the site. As such item name|item price|item quantity
Here is my code which attempts to achieve this, I come close but I am having some wonky behaviour with the output. It throws an undefined offet error on the noted line $itemvalue = $doc->createTextNode($irow[$g]); (as if the loop is running too many times) and also does not produce the expected output.
function contains($substring, $string) {
$pos = strpos($string, $substring);
if($pos === false) {
// string needle NOT found in haystack
return false;
}
else {
// string needle found in haystack
return true;
}
}
function csv2xml($csvData) {
$outputFilename = 'test.xml';
// Open csv to read
$input = fopen($csvData, 'rt');
// Get the headers of the file
$headers = fgetcsv($input);
// Create a new dom document with pretty formatting
$doc = new DomDocument();
$doc->formatOutput = true;
// Add a root node to the document
$root = $doc->createElement('orders');
$root = $doc->appendChild($root);
while (($row = fgetcsv($input)) !== FALSE) {
$container = $doc->createElement('order');
foreach ($headers as $i => $header)
{
//set temp file name here
$tempFile = "temp.csv";
//prepare mockCSV
$mockCSV = "";
$mockCSV .= "item_name,item_price,item_quantity";
$mockCSV .= "\n";
//check if current property has items data with |
if (contains("|", $row[$i])) {
//if it does create array of data
$item_arr = explode("|", $row[$i]);
//create header for 'items' node
$child = $doc->createElement($header);
$child = $container->appendChild($child);
//count for items
$count = 0;
foreach($item_arr as $k => $item) {
$mockCSV .= trim($item) . ",";
if($count == 2) {
// Trim off the extra comma
$mockCSV = trim($mockCSV, ',');
// Add an LF
$mockCSV .= "\n";
}
$count++;
}
// Trim off the extra comma
$mockCSV = trim($mockCSV, ',');
// Add an LF
$mockCSV .= "\n";
//put mock CSV data in temp file
$f = fopen($tempFile, "w");
fwrite($f, $mockCSV);
fclose($f);
//get data from temp file
$iteminput = fopen($tempFile, 'rt');
//get headers from temp file
$itemheaders = fgetcsv($iteminput);
while (($irow = fgetcsv($iteminput)) !== FALSE) {
$itemchild = $doc->createElement('item');
foreach($itemheaders as $g => $itemheader) {
$subchild = $doc->createElement($itemheader);
$subchild = $itemchild->appendChild($subchild);
$itemvalue = $doc->createTextNode($irow[$g]); /* OFFSET HAPPENS HERE */
$itemvalue = $subchild->appendChild($itemvalue);
}
}
$itemchild = $child->appendChild($itemchild);
}
else {
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
}
$root->appendChild($container);
}
$strxml = $doc->saveXML();
$handle = fopen($outputFilename, "w");
fwrite($handle, $strxml);
fclose($handle);
}
echo csv2xml("test.csv");
?>
The expected output should be the same as the XML structure I posted above, but instead it is doing this:
<orders>
<order>
<order_id>38</order_id>
<order_number>000015</order_number>
<order_status>Authorized</order_status>
<order_date>0000-00-00 00:00:00</order_date>
<customer_email>test#someemail.ca</customer_email>
<order_amount>$96.00</order_amount>
<base_order_amount>$96.00</base_order_amount>
<shipping_type>Basic Shipping</shipping_type>
<shipping_price> $0.00</shipping_price>
<billing_first_name>Name</billing_first_name>
<billing_last_name>B</billing_last_name>
<billing_address1>PO / Add</billing_address1>
<billing_address2></billing_address2>
<billing_city>Town</billing_city>
<billing_state_province>province</billing_state_province>
<billing_country>Canada</billing_country>
<billing_postal_code>postal code</billing_postal_code>
<billing_phone></billing_phone>
<emt_quest>test</emt_quest>
<emt_answ>test</emt_answ>
<emt_answ_conf>test</emt_answ_conf>
<shipping_first_name>Name</shipping_first_name>
<shipping_last_name>B</shipping_last_name>
<shipping_address1>PO / Add</shipping_address1>
<shipping_address2></shipping_address2>
<shipping_city>Town</shipping_city>
<shipping_state_province>province</shipping_state_province>
<shipping_country>Canada</shipping_country>
<shipping_postal_code>postal code</shipping_postal_code>
<shipping_phone></shipping_phone>
<items>
<item>
<item_name></item_name>
<item_price></item_price>
<item_quantity></item_quantity>
</item>
</items>
</order>
And not putting the values in for some of the fields. Also it does not repeat for double product entries as shown whose source .CSV field looks like this item name|item price|item quantity|item name|item price|item quantity
This is my problem, I can't seem to handle the pipe delimited field properly it doesn't output as expected. In an earlier version of the code I got all the data, but it did not create separate 'item' nodes.
Any help is much appreciated, at this point I think its something simple and I just need another pair of eyes on the subject.
More to the point I am using very patchy code here I feel, I am out of practice with .PHP - I feel there must be some sort of logic problem with how I am going about this - my way can work but there must be a more streamlined method. If anyone could tell me what that is - that's the answer I'm really looking for.
TL:DR starts here
I am trying to convert .CSV data into structured .XML data using pipe delimiting for the second generation and third generation XML children
Only one field in my source .CSV file 'items' contains such information - all other items are single key single entry the data looks like this item name|item price|item quantity|item name|item price|item quantity
So what I do is check for | inside of the .CSV string which is currently being ran through the loop and if it is detected, I use explode() to create an array of what was in there.
I've tried recreating a mock CSV file and putting it in a temp directory to place this information in and then using basic CSV to XML which does work in my program to place that data into the XML Dom Document
Expected output:
<items>
<item>
<item_name>Sample Item</item_name>
<item_price>$8.00</item_price>
<item_quantity>12</item_quantity>
</item>
<item>
<item_name>Sample Item 2</item_name>
<item_price>$8.00</item_price>
<item_quantity>12</item_quantity>
</item>
</items>
Output I am getting:
<items>
<item>
<item_name></item_name>
<item_price></item_price>
<item_quantity></item_quantity>
</item>
</items>
A lot of info I need to get out there to properly illustrate the issue but my problem is simple - how can I achieve the output I want.
Let me backup and offer a routine for CSV to XML first, then take care of the piped elements.
Some comments:
I prefer SimpleXML over DOM for its ease of use, so I'll use it in the example. Of course, it can be done with DOM as well.
I'll make use of str_getcsv() instead of fgetcsv() to be able to create a working example online.
basic CSV to XML
// XML: set up object
$xml = simplexml_load_string("<orders/>");
// CSV: assume CSV in $c, get it as a whole
$csv = str_getcsv($c, "\n");
// CSV: separate 1st row with field names from the following rows
$names = str_getcsv(array_shift($csv));
// CSV: parse row by row
foreach ($csv as $row) {
// CSV: combine names as keys => data as values
$row = array_combine($names, str_getcsv($row));
// XML: create new <order>
$xml_order = $xml->addChild("order");
// CSV: parse a single row
foreach ($row as $key => $value) {
// *****
// XML: create field as child of <order>
$xml_order->addChild($key, $value);
// *****
}
}
handle piped elements
the following code replaces the lines between // ***** above
// CSV: check for pipes, attention use strict comparison ===
if (strpos($value, "|") === false) {
// XML: no pipe, create node as a child of <order>
$xml_order->addChild($key, $value);
} else {
// CSV: pipe present, split up data
$csv_items = str_getcsv($value,"|");
// XML: create <items> node
$xml_items = $xml_order->addChild($key);
// CSV: iterate over $csv_items, each 3 elements = 1 row
// chop row after row
while (!empty($csv_items)) {
// XML: create <item> node as child of <items>
$xml_item = $xml_items->addChild("item");
// XML: create children of <item> node
$xml_item->addChild("item_name", array_shift($csv_items));
$xml_item->addChild("item_price", array_shift($csv_items));
$xml_item->addChild("item_quantity", array_shift($csv_items));
}
}
combine code without comments
$xml = simplexml_load_string("<orders/>");
$csv = str_getcsv($c, "\n"); // assume CSV in $c
$names = str_getcsv(array_shift($csv));
foreach ($csv as $row) {
$row = array_combine($names, str_getcsv($row));
$xml_order = $xml->addChild("order");
foreach ($row as $key => $value) {
if (strpos($value, "|") === false)
$xml_order->addChild($key, $value);
else {
$csv_items = str_getcsv($value,"|");
$xml_items = $xml_order->addChild($key);
while (!empty($csv_items)) {
$xml_item = $xml_items->addChild("item");
$xml_item->addChild("item_name", array_shift($csv_items));
$xml_item->addChild("item_price", array_shift($csv_items));
$xml_item->addChild("item_quantity", array_shift($csv_items));
}
}
}
}
see it working: https://eval.in/368945

Exporting scraped data to CSV

I am currently working on a project to scrape data from a website. I have scraped all the data I need with the following code, however I would like to know what is the easiest way to output this data into a comma-delimited CSV file. I had originally planned to move all of it into a table then export that way, but I know don't if that's the most efficient method.
<?php
$html = file_get_contents("http://www.zillow.com/homes/for_sale/Alamance-County-NC/2117_rid/36.181671,-78.943291,35.912411,-79.835243_rect/10_zm/1_fr/");
$DOM = new DOMDocument();
libxml_use_internal_errors(true);
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'property-address';
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$csv_values = array();
foreach ($nodes as $node) {
$csv_values[] = $node->nodeValue;
}
$handle = fopen("C:\Users\Stephen\Documents\WorkCSV\work.csv", "w");
if (false !== $handle) {
fputcsv($handle, $csv_values);
}
?>
I was able to get all of my data into an array using the code provided by Dave. Also, in fopen I was using backslashes "\" , and after switching to forward slashes "/" I was able to produce an error I can work with for exporting to CSV.
You can use fputcsv and iterator_to_array:
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$handle = fopen("/path/to/file.csv", "w");
if (false !== $handle) {
fputcsv($handle, iterator_to_array($nodes));
}
Or like this:
$csv_values = array();
foreach ($nodes as $node) {
$csv_values[] = $node->nodeValue;
}
$handle = fopen("/path/to/file.csv", "w");
if (false !== $handle) {
fputcsv($handle, $csv_values);
}
On the Windows platform, be careful to escape any backslashes used in the path to the file, or use forward slashes.
$handle = fopen("c:\\folder\\file.csv", "r");

Adding data from php/mysql query into an XML file

I want to add/display data from querying from the database and add it into an XML file.
Example, I have a table_persons which has a name and age. I create a mysql query to get its name and age. Then simply put the data(name and age of persons) into an XML file.
How would you do that? Or is it possible?
I suggest you use DomDocument and file_put_contents to create your XML file.
Something like this:
// Create XML document
$doc = new DomDocument('1.0', 'UTF-8');
// Create root node
$root = $doc->createElement('persons');
$root = $doc->appendChild($root);
while ($row = mysql_fetch_assoc($result)) {
// add node for each row
$node = $doc->createElement('person');
$node = $root->appendChild($node);
foreach ($row as $column => $value) {
$columnElement = $doc->createElement($column);
$columnElement = $node->appendChild($columnElement);
$columnValue = $doc->createTextNode($value);
$columnValue = $columnElement->appendChild($columnValue);
}
}
// Complete XML document
$doc->formatOutput = true;
$xmlContent = $doc->saveXML();
// Save to file
file_put_contents('persons.xml', $xmlContent);
<?php
[snip] //database code here
$f = fopen('myxml.xml', 'a+');
foreach($row = mysqli_fetch_assoc($resultFromQuery))
{
$str = "<person>
<name>{$row['name']}</name>
<age>{$row['age']}</age>
</person>\n";
fwrite($f, $str);
}
fclose($f);
?>
Assuming you use mysqli, this code works. If not, suit to fit. In the fopen function call, the a+ tells it to open it for reading at writing, placing the pointer at the end of the file.
Best of luck.

Write to a file using PHP

Bassicly what I want to do is using PHP open a xml file and edit it using php now this I can do using fopen() function.
Yet my issue it that i want to append text to the middle of the document. So lets say the xml file has 10 lines and I want to append something before the last line (10) so now it will be 11 lines. Is this possible. Thanks
Depending on how large that file is, you might do:
$lines = array();
$fp = fopen('file.xml','r');
while (!feof($fp))
$lines[] = trim(fgets($fp));
fclose($fp);
array_splice($lines, 9, 0, array('newline1','newline2',...));
$new_content = implode("\n", $lines);
Still, you'll need to revalidate XML-syntax afterwards...
If you want to be able to modify a file from the middle, use the c+ open mode:
$fp = fopen('test.txt', 'c+');
for ($i=0;$i<5;$i++) {
fgets($fp);
}
fwrite($fp, "foo\n");
fclose($fp);
The above will write "foo" on the fifth line, without having to read the file entirely.
However, if you are modifying a XML document, it's probably better to use a DOM parser:
$dom = new DOMDocument;
$dom->load('myfile.xml');
$linenum = 5;
$newNode = $dom->createElement('hello', 'world');
$element = $dom->firstChild->firstChild; // skips the root node
while ($element) {
if ($element->getLineNo() == $linenum) {
$element->parentNode->insertBefore($newNode, $element);
break;
}
$element = $element->nextSibling;
}
echo $dom->saveXML();
Of course, the above code depends on the actual XML document structure. But, the $element->getLineNo() is the key here.

Categories