Just wondering if anyone can point me in the direction of some tips / a script that will help me create an XML from an original CSV File, using PHP.
Cheers
This is quite easy to do, just look at fgetcsv to read csv files and then DomDocument to write an xml file. This version uses the headers from the file as the keys of the xml document.
<?php
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
ini_set('auto_detect_line_endings', true);
$inputFilename = 'input.csv';
$outputFilename = 'output.xml';
// Open csv to read
$inputFile = fopen($inputFilename, 'rt');
// Get the headers of the file
$headers = fgetcsv($inputFile);
// Create a new dom document with pretty formatting
$doc = new DomDocument();
$doc->formatOutput = true;
// Add a root node to the document
$root = $doc->createElement('rows');
$root = $doc->appendChild($root);
// Loop through each row creating a <row> node with the correct data
while (($row = fgetcsv($inputFile)) !== FALSE)
{
$container = $doc->createElement('row');
foreach($headers as $i => $header)
{
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
$root->appendChild($container);
}
$strxml = $doc->saveXML();
$handle = fopen($outputFilename, "w");
fwrite($handle, $strxml);
fclose($handle);
The code given above creates an XML document, but does not store it on any physical device. So replace echo $doc->saveXML(); with
$strxml = $doc->saveXML();
$handle = fopen($outputFilename, "w");
fwrite($handle, $strxml);
fclose($handle);
There are a number of sites out there that will do it for you.
If this is going to be a regular process rather than a one-time thing it may be ideal to just parse the CSV and output the XML yourself:
$csv = file("path/to/csv.csv");
foreach($csv as $line)
{
$data = explode(",", $line);
echo "<xmltag>".$data[0]."</xmltag>";
//etc...
}
Look up PHP's file and string functions.
Sorry to bring up an old thread but I tried this script and I'm getting a DOM Exception error
The headers of our CSV Files are display_name office_number mobile_number and the error I'm receiving is DOMDocument->createElement('\xEF\xBB\xBFdisplay_name') #1 {main}
Related
I have this CSV file: http://www.gamesdeal.com/media/feedgenerator/Gamekey.csv
And get this error with PHP:
PHP Notice: Undefined offset: 6 in
But the problem is that I can not create the CSV file by my self. It is from a store. So, I can't modify it... Does somebody knows how I can fix this error?
Here my code:
function csvToXML($inputFilename, $outputFilename, $delimiter = ','){
// Open csv to read
$inputFile = fopen($inputFilename, 'rt');
// Get the headers of the file
$headers = fgetcsv($inputFile, 0, $delimiter);
// Create a new dom document with pretty formatting
$doc = new DOMDocument('1.0', 'utf-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
// Add a root node to the document
$root = $doc->createElement('products');
$root = $doc->appendChild($root);
while (($row = fgetcsv($inputFile, 0, $delimiter)) !== false) {
$container = $doc->createElement('product');
foreach ($headers as $i => $header) {
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
$root->appendChild($container);
}
$strxml = $doc->saveXML();
$handle = fopen($outputFilename, 'w');
fwrite($handle, $strxml);
fclose($handle);
}
Here is the problem:
header: products_price <tab> price_currency
data: 5.45 EUR (no tab between 5.45 and EUR)
So in the header there are 7 fields defined, but only 6 in the data (also most records don't have a EAN value, but there's a tab at the end, so that should be ok).
To fix this you could:
read all the fields manually
first replace products_price <tab> price_currency with products_price price_currency in the header
remove price_currency from $headers
or somehow let the parser know there are only 6 fields instead of 7
You probably have to correct the price field afterwards then.
I have this code:
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->Load('/home/dom/public_html/cache/feed.xml');
$xmlor = '/home/dom/public_html/cache/feed.xml';
// open file and prepare mods
$fh = fopen($xmlor, 'r+');
$data = fread($fh, filesize($xmlor));
$dmca_claim_jpg = array( 'baduser_.jpg','user78.jpg' );
$dmca_claim_link = array( 'mydomain.com/baduser_','mydomain.com/user78' );
echo "Opening local XML for edit..." . PHP_EOL;
$new_data = str_replace("extdomain.com", "mydomain.com", $data);
$new_data2 = str_replace($dmca_claim_jpg, "DMCA.jpg", $data);
$new_data3 = str_replace($dmca_claim_link, "#", $data);
fclose($fh);
// run mods
$fh = fopen($xmlor, 'r+');
fwrite($fh, $new_data);
fwrite($fh, $new_data2);
fwrite($fh, $new_data3);
echo "Updated feed URL and DMCA claims in local XML..." . PHP_EOL;
fclose($fh);
It does not give any errors when executing but messes up the xml file by removing the first two lines (weird) when fwriting $new_data2 and $new_data3 to xml file.
It works fine writing only $new_data...
I think it has to do with the $dmca_claim_jpg/link arrays.
Parse XML using SimpleXML or DOMDocument it's cleaner and you have a standard OOP way of accessing nodes
I have a JSON file badly formatted (doc1.json):
{"text":"xxx","user":{"id":96525997,"name":"ss"},"id":29005752194568192}
{"text":"yyy","user":{"id":32544632,"name":"cc"},"id":29005753951977472}
{...}{...}
And I have to change it in this:
{"u":[
{"text":"xxx","user":{"id":96525997,"name":"ss"},"id":29005752194568192},
{"text":"yyy","user":{"id":32544632,"name":"cc"},"id":29005753951977472},
{...},{...}
]}
Can I do this in a PHP file?
//Get the contents of file
$fileStr = file_get_contents(filelocation);
//Make proper json
$fileStr = str_replace('}{', '},{', $fileStr);
//Create new json
$fileStr = '{"u":[' . $fileStr . ']}';
//Insert the new string into the file
file_put_contents(filelocation, $fileStr);
I would build the data structure you want from the file:
$file_path = '/path/to/file';
$array_from_file = file($file_path);
// set up object container
$obj = new StdClass;
$obj->u = array();
// iterate through lines from file
// load data into object container
foreach($array_from_file as $json) {
$line_obj = json_decode($json);
if(is_null($line_obj)) {
throw new Exception('We have some bad JSON here.');
} else {
$obj->u[] = $line_obj;
}
}
// encode to JSON
$json = json_encode($obj);
// overwrite existing file
// use 'w' mode to truncate file and open for writing
$fh = fopen($file_path, 'w');
// write JSON to file
$bytes_written = fwrite($fh, $json);
fclose($fh);
This assumes each of the JSON object repsentations in your original file are on a separate line.
I prefer this approach over string manipulation, as you can then have built in checks where you are decoding JSON to see if the input is valid JSON format that can be de-serialized. If the script operates successfully, this guarantees that your output will be able to be de-serialized by the caller to the script.
I am not sure why this was working fine last night and this morning I am getting
Fatal error: Out of memory (allocated 1611137024) (tried to allocate
1610350592 bytes) in /home/twitcast/public_html/system/index.php on
line 121
The section of code being ran is as follows
function podcast()
{
$fetch = new server();
$fetch->connect("TCaster");
$collection = $fetch->db->shows;
// find everything in the collection
$cursor = $collection->find();
if($cursor->count() > 0)
{
$test = array();
// iterate through the results
while( $cursor->hasNext() ) {
$test[] = ($cursor->getNext());
}
$i = 0;
foreach($test as $d) {
for ( $i = 0; $i <= 3; $i ++) {
$url = $d["streams"][$i];
$xml = file_get_contents( $url );
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML( $xml); // $xml = file_get_contents( "http://www.c3carlingford.org.au/podcast/C3CiTunesFeed.xml")
// Initialize XPath
$xpath = new DOMXpath( $doc);
// Register the itunes namespace
$xpath->registerNamespace( 'itunes', 'http://www.itunes.com/dtds/podcast-1.0.dtd');
$items = $doc->getElementsByTagName('item');
foreach( $items as $item) {
$title = $xpath->query( 'title', $item)->item(0)->nodeValue;
$published = strtotime($xpath->query( 'pubDate', $item)->item(0)->nodeValue);
$author = $xpath->query( 'itunes:author', $item)->item(0)->nodeValue;
$summary = $xpath->query( 'itunes:summary', $item)->item(0)->nodeValue;
$enclosure = $xpath->query( 'enclosure', $item)->item(0);
$url = $enclosure->attributes->getNamedItem('url')->value;
$fname = basename($url);
$collection = $fetch->db->shows_episodes;
$cursorfind = $collection->find(array("internal_url"=>"http://twitcatcher.russellharrower.com/videos/$fname"));
if($cursorfind->count() < 1)
{
$copydir = "/home/twt/public_html/videos/";
$data = file_get_contents($url);
$file = fopen($copydir . $fname, "w+");
fputs($file, $data);
fclose($file);
$collection->insert(array("show_id"=> new MongoId($d["_id"]),"stream"=>$i,"episode_title"=>$title, "episode_summary"=>$summary,"published"=>$published,"internal_url"=>"http://twitcatcher.russellharrower.com/videos/$fname"));
echo "$title <br> $published <br> $summary <br> $url<br><br>\n\n";
}
}
}
}
}
line 121 is
$data = file_get_contents($url);
You want to add 1.6GB of memory usage for a single PHP thread? While you can increase the memory limit, my strong advice is to look at another way of doing what you want.
Probably the easiest solution: you can use CURL to request a byte range of the source file (using Curl is wiser than get_file_contents anyway, for remote files). You can get 100K ata time, write to the local file then got the next 100k and appeand to the file etc, until the entire file is pulled in.
You may also do something with streams, but it gets a little more complex. This may be your only option if the remote server won't let you get part of a file by bytes.
Finally there's Linux commands such as wget, run through exec(), if your server has permissions.
Memory Limit - take a look at this directive. Suppose that is what you need.
or you may try to use copy instead of reading file to memory (which is video file, as I understand so there is nothing strange that it takes a lot of memory):
$copydir = "/home/twt/public_html/videos/";
copy($url, $copydir . $fname);
Looks like last night opened file were smaller)
My PHP script successfully reads all text from a .docx file, but I cannot figure out where the line breaks should be so it makes the text bunched up and hard to read (one huge paragraph). I have manually gone over all of the XML files to try and figure it out but I cannot figure it out.
Here are the functions I use to retrieve the file data and return the plain text.
public function read($FilePath)
{
// Save name of the file
parent::SetDocName($FilePath);
$Data = $this->docx2text($FilePath);
$Data = str_replace("<", "<", $Data);
$Data = str_replace(">", ">", $Data);
$Breaks = array("\r\n", "\n", "\r");
$Data = str_replace($Breaks, '<br />', $Data);
$this->Content = $Data;
}
function docx2text($filename) {
return $this->readZippedXML($filename, "word/document.xml");
}
function readZippedXML($archiveFile, $dataFile)
{
// Create new ZIP archive
$zip = new ZipArchive;
// Open received archive file
if (true === $zip->open($archiveFile))
{
// If done, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false)
{
// If found, read it to the string
$data = $zip->getFromIndex($index);
// Close archive file
$zip->close();
// Load XML from a string
// Skip errors and warnings
$xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
$xmldata = $xml->saveXML();
//$xmldata = str_replace("</w:t>", "\r\n", $xmldata);
// Return data without XML formatting tags
return strip_tags($xmldata);
}
$zip->close();
}
// In case of failure return empty string
return "";
}
It is actually quite a simple answer. All you need to do is add this line in readZippedXML():
$xmldata = str_replace("</w:p>", "\r\n", $xmldata);
This is because </w:p> is what word uses to mark the end of a paragraph. E.g.
<w:p>This is a paragraph.</w:p>
<w:p>And a second one.</w:p>
Actually, why don't you use OpenXML? I think it works with PHP too. And then you don't have to go down to the nitty gritty file xml details.
Here is a link:
http://openxmldeveloper.org/articles/4606.aspx