Converting XML to CSV using PHP - php

I need to convert an XML file to CSV.
I have a script but I am unsure of how to use it to my needs.
Here is the script
$filexml='141.xml';
if (file_exists($filexml)) {
$xml = simplexml_load_file($filexml);
$f = fopen('141.csv', 'w');
foreach ($xml->item as $item) {
fputcsv($f, get_object_vars($item),',','"');
}
fclose($f);
}
The file is called 141.xml and here is some of the code in the XML which I need to convert.
<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<item>
<title><![CDATA[//title name]]></title>
<link><![CDATA[https://www.someurl.co.uk]]></link>
<description><![CDATA[<p><span>Demo Description</span></p>]]></description>
<g:id><![CDATA[4796]]></g:id>
<g:condition><![CDATA[new]]></g:condition>
<g:price><![CDATA[0.89 GBP]]></g:price>
<g:availability><![CDATA[in stock]]></g:availability>
<g:image_link><![CDATA[https://image-location.png]]></g:image_link>
<g:service><![CDATA[Free Shipping]]></g:service>
<g:price><![CDATA[0 GBP]]></g:price>
</item>
I am running the script from SSH using:
php /var/www/vhosts/mywebsite.co.uk/httpdocs/xml/convert.php
If you can help me, it would be really appreciated :)
Thanks

Consider passing XML data into an array $values and exporting array by row to csv.
Specifically, using the xpath() function for the XML extraction, iterate through each <item> and extract all its children's values (/*). By the way, I add headers in the CSV file.
$filexml='141.xml';
if (file_exists($filexml)) {
$xml = simplexml_load_file($filexml);
$i = 1; // Position counter
$values = []; // PHP array
// Writing column headers
$columns = array('title', 'link', 'description', 'id', 'condition',
'price', 'availability', 'image_link', 'service', 'price');
$fs = fopen('141.csv', 'w');
fputcsv($fs, $columns);
fclose($fs);
// Iterate through each <item> node
$node = $xml->xpath('//item');
foreach ($node as $n) {
// Iterate through each child of <item> node
$child = $xml->xpath('//item['.$i.']/*');
foreach ($child as $value) {
$values[] = $value;
}
// Write to CSV files (appending to column headers)
$fs = fopen('141.csv', 'a');
fputcsv($fs, $values);
fclose($fs);
$values = []; // Clean out array for next <item> (i.e., row)
$i++; // Move to next <item> (i.e., node position)
}
}

Try out below code. And the XML file is having syntax error, the closing tag for rss and channel is missing.
$filexml='141.xml';
if (file_exists($filexml))
{
$xml = simplexml_load_file($filexml);
$f = fopen('141.csv', 'w');
createCsv($xml, $f);
fclose($f);
}
function createCsv($xml,$f)
{
foreach ($xml->children() as $item)
{
$hasChild = (count($item->children()) > 0)?true:false;
if( ! $hasChild)
{
$put_arr = array($item->getName(),$item);
fputcsv($f, $put_arr ,',','"');
}
else
{
createCsv($item, $f);
}
}
}

Related

Deleting multiple namespaces temporarily without saving to file in PHP?

So the following code doesn't work, but it's mainly because of the namespaces at the root element of the file I am trying to parse. I would like to delete the XML namespaces temporarily without saving the changes to the file.
$fxml = "{$this->path}/input.xml";
if (file_exists($fxml)) {
$xml = simplexml_load_file($fxml);
$fs = fopen("{$this->path}/output.csv", 'w');
$xml->registerXPathNamespace('e', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$fieldDefs = [
'url' => 'url',
'id' => 'id',
];
fputcsv($fs, array_keys($fieldDefs));
foreach ($xml->xpath('//e:urlset') as $url) {
$fields = [];
foreach ($fieldDefs as $fieldDef) {
$fields[] = $url->xpath('e:'. $fieldDef)[0];
}
fputcsv($fs, $fields);
fclose($fs);
}
}
So this script fails and gives out an empty csv when I have the following XML.
It doesn't work when I have 1 namespace registered in the root element.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.mywebsite.com/id/2111</loc>
<id>903660</id>
</url>
<url>
<loc>https://www.mywebsite.com/id/211</loc>
<id>911121</id>
</url>
</urlset>
The issue is that I have two namespaces registered in the root element. Is there a way to remove the namespaces to make processing simpler?
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://www.mywebsite.com/id/2111</loc>
<id>903660</id>
</url>
<url>
<loc>https://www.mywebsite.com/id/211</loc>
<id>911121</id>
</url>
</urlset>
You actually need to call registerXPathNamespace at every level that runs xpath. However, consider a simpler approach by avoiding the bookkeeping of $fields array and directly cast XPath array to base array:
// LOAD XML
$xml = simplexml_load_file($fxml);
// OUTER PARSE XML
$xml->registerXPathNamespace('e', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$urls = $xml->xpath('//e:url');
// INITIALIZE CSV
$fs = fopen('output.csv', 'w');
// WRITE HEADERS
$headers = array_keys((array)$urls[0]);
fputcsv($fs, $headers);
// INNER PARSE XML
foreach($urls as $url) {
// WRITE ROWS
fputcsv($fs, (array)$url);
}
fclose($fs);
You would need the delete the namespace definitions and prefixes before loading the XML. This would modify the meaning of the nodes and possibly break the XML. However it is not needed.
The problem with SimpleXMLElement is that you need to re-register the namespaces on any instance you like to call xpath() on. Put that part in a small helper class and you're fine:
class SimpleXMLNamespaces {
private $_namespaces;
public function __construct(array $namespaces) {
$this->_namespaces = $namespaces;
}
function registerOn(SimpleXMLElement $target) {
foreach ($this->_namespaces as $prefix => $uri) {
$target->registerXpathNamespace($prefix, $uri);
}
}
}
You already have a mapping array for the field definitions. Put the full Xpath expression for the fields into it:
$xmlns = new SimpleXMLNamespaces(
[
'sitemap' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
'xhtml' => 'http://www.w3.org/1999/xhtml',
]
);
$urlset = new SimpleXMLElement($xml);
$xmlns->registerOn($urlset);
$columns = [
'url' => 'sitemap:loc',
'id' => 'sitemap:id',
];
$fs = fopen("php://stdout", 'w');
fputcsv($fs, array_keys($columns));
foreach ($urlset->xpath('//sitemap:url') as $url) {
$xmlns->registerOn($url);
$row = [];
foreach ($columns as $expression) {
$row[] = (string)($url->xpath($expression)[0] ?? '');
}
fputcsv($fs, $row);
}
Output:
url,id
https://www.mywebsite.com/id/2111,903660
https://www.mywebsite.com/id/211,911121
Or use DOM. DOM has a separate class/object for Xpath that stores the namespace registration so the re-register is not needed. Additionally DOMXpath::evaluate() allows for Xpath expressions that return scalar values directly.
// boostrap DOM + Xpath
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('sitemap', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$xpath->registerNamespace('xhtml', 'http://www.w3.org/1999/xhtml');
// include string cast in the Xpath expression
// it will return an empty string if it doesn't match
$columns = [
'url' => 'string(sitemap:loc)',
'id' => 'string(sitemap:id)',
];
$fs = fopen("php://stdout", 'w');
fputcsv($fs, array_keys($columns));
// iterate the url elements
foreach ($xpath->evaluate('//sitemap:url') as $url) {
$row = [];
foreach ($columns as $expression) {
// evaluate xpath expression for column
$row[] = $xpath->evaluate($expression, $url);
}
fputcsv($fs, $row);
}
Sitemaps are typically large, to avoid the memory consumption you can use XMLReader+DOM.
// define a list of used namespaces
$xmlns = [
'sitemap' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
'xhtml' => 'http://www.w3.org/1999/xhtml'
];
// create a DOM document for node expansion + xpath expressions
$document = new DOMDocument();
$xpath = new DOMXpath($document);
foreach ($xmlns as $prefix => $namespaceURI) {
$xpath->registerNamespace($prefix, $namespaceURI);
}
// open the XML for reading
$reader = new XMLReader();
$reader->open($xmlUri);
// go to the first url element in the sitemap namespace
while (
$reader->read() &&
(
$reader->localName !== 'url' ||
$reader->namespaceURI !== $xmlns['sitemap']
)
) {
continue;
}
$columns = [
'url' => 'string(sitemap:loc)',
'id' => 'string(sitemap:id)',
];
$fs = fopen("php://stdout", 'w');
fputcsv($fs, array_keys($columns));
// check the current node is an url
while ($reader->localName === 'url') {
// in the sitemap namespace
if ($reader->namespaceURI === $xmlns['sitemap']) {
// expand node to DOM for Xpath
$url = $reader->expand($document);
$row = [];
foreach ($columns as $expression) {
// evaluate xpath expression for column
$row[] = $xpath->evaluate($expression, $url);
}
fputcsv($fs, $row);
}
// goto next url sibling node
$reader->next('url');
}
$reader->close();

XML to CSV with PHP converter [problem with images grabing]

I really need your help who works with XML and PHP. Looked for many other questions, but still nothing was found about my situation when in xml there is deeper fields and I can't grab them to csv output (code below).
<product>
<images>
<image>...</image>
<image>...</image>
</images>
</product>
My XML file looks like this:
<root>
<product>
<url>
<![CDATA[
https://
]]>
</url>
<id>185</id>
<barcode>284</barcode>
<categories>
<category>14</category>
<category>2</category>
</categories>
<title>
<![CDATA[ Product1 ]]>
</title>
<description>
<![CDATA[
<p>description</p>
]]>
</description>
<price>10</price>
<sec_costs>13.000000</sec_costs>
<quantity>10</quantity>
<warranty/>
<weight>0.000000</weight>
<delivery_text>
<![CDATA[ 1 - 2 d. ]]>
</delivery_text>
<manufacturer>
<![CDATA[ ]]>
</manufacturer>
<images>
<image>
<![CDATA[
https://test.eu/r.jpg
]]>
</image>
<image>
<![CDATA[
https://test.eu/er.jpg
]]>
</image>
<image>
<![CDATA[
https://test.eu/eer.jpg
]]>
</image>
</images>
<product_with_gift>
<![CDATA[ False ]]>
</product_with_gift>
<barcode_format>
<![CDATA[ EAN ]]>
</barcode_format>
</product>
I am using this code to convert it from XML to CSV (used it from other member), the problem is the code works fine, but it doesn't grab images (tried replacing image with images, added extra images columns, but nothing worked out, it just doesn't grab links to image files:
<?
$filexml = 'imp2.xml';
$xml = simplexml_load_file($filexml);
$xml->registerXPathNamespace('g', 'http://base.google.com/ns/1.0');
if (file_exists($filexml)) {
$xml = simplexml_load_file($filexml);
$i = 1; // Position counter
$values = []; // PHP array
// Writing column headers
$columns = array('id', 'barcode', 'title', 'description', 'price', 'sec_costs', 'quantity', 'warranty', 'weight', 'delivery_text', 'manufacturer', 'image', 'product_with_gift', 'barcode_format');
$fs = fopen('csv.csv', 'w');
fputcsv($fs, $columns);
fclose($fs);
// Iterate through each <product> node
$node = $xml->xpath('//product');
foreach ($node as $n) {
// Iterate through each child of <item> node
foreach ($columns as $col) {
if (count($xml->xpath('//product['.$i.']/'.$col)) > 0) {
$values[] = trim($xml->xpath('//product['.$i.']/'.$col)[0]);
} else {
$values[] = '';
}
}
// Write to CSV files (appending to column headers)
$fs = fopen('csv.csv', 'a');
fputcsv($fs, $values);
fclose($fs);
$values = []; // Clean out array for next <item> (i.e., row)
$i++; // Move to next <item> (i.e., node position)
}
}
?>
Any solutions from mid, premium xml,php?
The problem is that you are trying to fetch a list of nodes using just the images tag as the start point, as the subnodes have their own content, they will not appear in the higher level nodes text.
I've made a few changes to the code, but also I now use the <image> element to fetch the data. This code doesn't assume it's just one node for each item, so when it uses the XPath, it always loops through all items and build them into a single string before adding them to the CSV.
$filexml = 'imp2.xml';
if (file_exists($filexml)) {
// Only open file once you know it exists
$xml = simplexml_load_file($filexml);
$i = 1; // Position counter
$values = []; // PHP array
// Writing column headers
$columns = array('id', 'barcode', 'title', 'description', 'price', 'sec_costs', 'quantity', 'warranty', 'weight', 'delivery_text', 'manufacturer', 'image', 'product_with_gift', 'barcode_format');
// Open output file at start
$fs = fopen('csv.csv', 'w');
fputcsv($fs, $columns);
// Iterate through each <product> node
$node = $xml->xpath('//product');
foreach ($node as $n) {
// Iterate through each child of <item> node
foreach ($columns as $col) {
// Use //'.$col so node doesn't have to be directly under product
$dataMatch = $xml->xpath('//product['.$i.']//'.$col);
if (count($dataMatch) > 0) {
// Build list of all matches
$newData = '';
foreach ( $dataMatch as $data) {
$newData .= trim((string)$data).",";
}
// Remove last comma before adding it in
$values[] = rtrim($newData, ",");
} else {
$values[] = '';
}
}
fputcsv($fs, $values);
$values = []; // Clean out array for next <item> (i.e., row)
$i++; // Move to next <item> (i.e., node position)
}
// Close file only at end
fclose($fs);
}

RSS parsing with PHP and SimpleXML: How to enter namespaced items?

I am parsing the following RSS feed (relevant part shown)
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
<prx:proxy>...</prx:proxy>
<prx:proxy>...</prx:proxy>
<pubDate>xxx</pubDate>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
Using the php code
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
var_dump($item); // Ok, Everything marked with xxx
var_dump($item->title); // Ok, title
foreach($item->proxy() as $entry) {
var_dump($entry); //empty
}
}
While I can access everything marked with xxx, I cannot access anything inside prx:proxy - mainly because : cannot be present in valid php varnames.
The question is how to reach prx:ip, as example.
Thanks!
Take a look at SimpleXMLElement::children, you can access the namespaced elements with that.
For example: -
<?php
$xml = '<xml xmlns:prx="http://example.org/">
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
</item>
</xml>';
$sxe = new SimpleXMLElement($xml);
foreach($sxe->item as $item)
{
$proxy = $item->children('prx', true)->proxy;
echo $proxy->ip; //101.226.74.169
}
Anthony.
I would just strip out the "prx:"...
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_rss = str_replace('prx:', '', $proxylist_rss);
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
foreach($item->proxy as $entry) {
var_dump($entry);
}
}
http://phpfiddle.org/main/code/jsz-vga
Try it like this:
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$feed = simplexml_load_string($proxylist_rss);
$ns=$feed->getNameSpaces(true);
foreach ($feed->channel->item as $item){
var_dump($item);
var_dump($item->title);
$proxy = $item->children($ns["prx"]);
$proxy = $proxy->proxy;
foreach ($proxy as $key => $value){
var_dump($value);
}
}

PHP DOM remove child

I am trying to remove the parent node of <wcccanumber> from my xml, if it's content matches a certain criterion, but it keeps just removing the one node <wcccanumber>. How do I remove the whole parent node?
Heres my code:
$xml = new SimpleXMLElement('<xml/>');
if (file_exists("xml/units/E01.xml")) {
$xml = simplexml_load_file("xml/units/E01.xml");
echo "File exists";
echo "</br>";
$wcccanumber = "121202482";
foreach ($xml->call->wcccanumber as $call) {
if ($call == $wcccanumber) {
$dom = dom_import_simplexml($call);
$dom->parentNode->removeChild($dom);
$fp = fopen("xml/units/E01.xml","wb");
fwrite($fp,$xml->asXML());
fclose($fp);
}
}
}
Here is the xml:
<xml>
<call>
<wcccanumber>121202482</wcccanumber>
<currentcall>FALL</currentcall>
<county>W</county>
<id>82</id>
<location>234 E MAIN ST</location>
<callcreated>12:26:09</callcreated>
<station>HBM</station>
<units>E01</units>
<calltype>M</calltype>
<lat>45.5225067888299</lat>
<lng>-122.987112718574</lng>
<inputtime>12/18/2012 12:27:01 pm</inputtime>
</call>
</xml>
Iterate through call and compare $call->wcccanumber with $wcccanumber. Convert $call to dom and remove it (parentNode->removeChild).
foreach ($xml->call as $call) {
if ($call->wcccanumber == $wcccanumber) {
$dom = dom_import_simplexml($call);
$dom->parentNode->removeChild($dom);
$fp = fopen("xml/units/E01.xml","wb");
fwrite($fp,$xml->asXML());
fclose($fp);
}
}
If there are multiple deletions it makes sense to save only once after all deletions have been done.
$deletionCount = 0;
foreach ($xml->call as $call) {
if ($call->wcccanumber != $wcccanumber) {
continue;
}
$dom = dom_import_simplexml($call);
$dom->parentNode->removeChild($dom);
$deletionCount++;
}
if ($deletionCount) {
file_put_contents("xml/units/E01.xml", $xml->asXML());
}

convert specified value on array from xml to csv in php

i want to convert to csv from xml file only the specified values...
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<file>
<PRODUCT sku="12345" price="129"/>
<PRODUCT sku="12356" price="150"/>
<PRODUCT sku="12367" price="160"/>
<PRODUCT sku="12389" price="190"/>
</file>
CSV File.
SKU,PRICE
12345,129
12356,150
12367,160
12389,190
but i want to get the price only for 12345, 12367 and 12389
This is my start file:
<?php
$filexml = 'file.xml';
if (file_exists($filexml)){
$xml = simplexml_load_file($filexml);
$file = fopen('file.csv', 'w');
$header = array('sku', 'price');
fputcsv($file, $header, ',', '"');
foreach ($xml->PRODUCT as $product){
$item = array();
$value1 = $product->attributes()->sku;
$value2 = $product->attributes()->price;
$item[] = $value1;
$item[] = $value2;
fputcsv($file, $item, ',', '"');
}
fclose($file);
}
?>
an option can be this, but is returning me Array, maybe is wrong something there.
<?php
$filexml = 'file.xml';
if (file_exists($filexml)){
$xml = simplexml_load_file($filexml);
$file = fopen('file.csv', 'w');
$header = array('sku', 'price');
$customvalue = array('12345', '12367', '12389');
fputcsv($file, $header, ',', '"');
foreach ($xml->PRODUCT as $product){
$item = array();
$value1 = $product->attributes()->sku;
$value2 = $product->attributes()->price;
$item[] = $customvalue;
$item[] = $value2;
fputcsv($file, $item, ',', '"');
}
fclose($file);
}
?>
Thanks
Ryan Solution:
<?php
$filexml = 'file.xml';
if (file_exists($filexml)){
$xml = simplexml_load_file($filexml);
$file = fopen('file.csv', 'w');
$header = array('sku', 'price');
$customvalue = array('12345', '12367', '12389');
fputcsv($file, $header, ',', '"');
foreach ($xml->PRODUCT as $product){
if ( in_array($product->attributes()->sku, $customvalue ) ) {
$item = array ();
$item[] = $product->attributes()->sku;
$item[] = $product->attributes()->price;
fputcsv($file, $item, ',', '"');
}
fclose($file);
}
?>
but the output is true and good, but i need to remove the unnecessary codes because in large file with about 7000 codes this get a 300mb csv file.
This is the output.
12345,129
12345,129,12356,150
12367,160
12389,190
in large files im getting this:
12345,129
12345,129,123456,150,12367,160
12389,190,123456,150,12367,160,12345,129
12399,200
12399,200,12345,129,12389,160,123456,150
12399,200,12345,129,12389,160,123456,150,12399,200,12345,129,12389,160,123456,150
the specified codes in array are first in the right column, but at the end this is creating a big csv file. and result timeout or memory out.
Thanks
You are on the right track. Add a check to ensure that you read only the specified rows, and simplify the pulling out of the fields you want, something like this (just the foreach loop, the rest while stay the same:
foreach ($xml->PRODUCT as $product){
if ( in_array($product->attributes()->sku, $customvalue ) ) {
$item = array ();
$item[] = $product->attributes()->sku;
$item[] = $product->attributes()->price;
fputcsv($file, $item, ',', '"');
}

Categories