I have a OFX file downloaded from Citibank, this file has a DTD defined at http://www.ofx.net/DownloadPage/Files/ofx102spec.zip (file OFXBANK.DTD), the OFX file appear to be SGML valid.
I'm trying with DomDocument of PHP 5.4.13, but I get several warning and file is not parsed. My Code is:
$file = "source/ACCT_013.OFX";
$dtd = "source/ofx102spec/OFXBANK.DTD";
$doc = new DomDocument();
$doc->loadHTMLFile($file);
$doc->schemaValidate($dtd);
$dom->validateOnParse = true;
The OFX file start as:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20130331073401
<LANGUAGE>SPA
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>COP
<BANKACCTFROM> ...
I'm open to install and use any program in Server (Centos) for call from PHP.
PD: This class http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html don't work for me.
Well first of all even XML is a subset of SGML a valid SGML file must not be a well-formed XML file. XML is more strict and does not use all features that SGML offers.
As DOMDocument is XML (and not SGML) based, this is not really compatible.
Next to that problem, please see 2.2 Open Financial Exchange Headers in Ofexfin1.doc it explains you that
The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header
and further on:
A blank line follows the last header. Then (for type OFXSGML), the SGML-readable data begins with the <OFX> tag.
So locate the first blank line and strip everyhing until there. Then load the SGML part into DOMDocument by converting the SGML into XML first:
$source = fopen('file.ofx', 'r');
if (!$source) {
throw new Exception('Unable to open OFX file.');
}
// skip headers of OFX file
$headers = array();
$charsets = array(
1252 => 'WINDOWS-1251',
);
while(!feof($source)) {
$line = trim(fgets($source));
if ($line === '') {
break;
}
list($header, $value) = explode(':', $line, 2);
$headers[$header] = $value;
}
$buffer = '';
// dead-cheap SGML to XML conversion
// see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx
while(!feof($source)) {
$line = trim(fgets($source));
if ($line === '') continue;
$line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line);
if (substr($line, -1, 1) !== '>') {
list($tag) = explode('>', $line, 2);
$line .= '</' . substr($tag, 1) . '>';
}
$buffer .= $line ."\n";
}
// use DOMDocument with non-standard recover mode
$doc = new DOMDocument();
$doc->recover = true;
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$save = libxml_use_internal_errors(true);
$doc->loadXML($buffer);
libxml_use_internal_errors($save);
echo $doc->saveXML();
This code-example then outputs the following (re-formatted) XML which also shows that DOMDocument loaded the data properly:
<?xml version="1.0"?>
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0</CODE>
<SEVERITY>INFO</SEVERITY>
</STATUS>
<DTSERVER>20130331073401</DTSERVER>
<LANGUAGE>SPA</LANGUAGE>
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0</TRNUID>
<STATUS>
<CODE>0</CODE>
<SEVERITY>INFO</SEVERITY>
</STATUS>
<STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>
I do not know whether or not this can be validated against the DTD then. Maybe this works. Additionally if the SGML is not written with the values that are of a tag on the same line (and only a single element on each line is required), then this fragile conversion will break.
Simplest OFX parse into an array with easy access to all values and transactions.
function parseOFX($ofx) {
$OFXArray=explode("<",$ofx);
$a=array();
foreach ($OFXArray as $v) {
$pair=explode(">",$v);
if (isset($pair[1])) {
if ($pair[1]!=NULL) {
if (isset($a[$pair[0]])) {
if (is_array($a[$pair[0]])) {
$a[$pair[0]][]=$pair[1];
} else {
$temp=$a[$pair[0]];
$a[$pair[0]]=array();
$a[$pair[0]][]=$temp;
$a[$pair[0]][]=$pair[1];
}
} else {
$a[$pair[0]]=$pair[1];
}
}
}
}
return $a;
}
i use this:
$source = utf8_encode(file_get_contents('a.ofx'));
//add end tag
$source = preg_replace('#^<([^>]+)>([^\r\n]+)\r?\n#mU', "<$1>$2</$1>\n", $source);
//skip header
$source = substr($source, strpos($source,'<OFX>'));
//convert to array
$xml = simplexml_load_string($source);
$array = json_decode(json_encode($xml),true);
print_r($array);
Related
What is the best way to format XML within a PHP class.
$xml = "<element attribute=\"something\">...</element>";
$xml = '<element attribute="something">...</element>';
$xml = '<element attribute=\'something\'>...</element>';
$xml = <<<EOF
<element attribute="something">
</element>
EOF;
I'm pretty sure it is the last one!
With DOM you can do
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML('<root><foo><bar>baz</bar></foo></root>');
$dom->formatOutput = TRUE;
echo $dom->saveXML();
gives (live demo)
<?xml version="1.0"?>
<root>
<foo>
<bar>baz</bar>
</foo>
</root>
See DOMDocument::formatOutput and DOMDocument::preserveWhiteSpace properties description.
This function works perfectlly as you want you don't have to use any xml dom library or nething just pass the xml generated string into it and it will parse and generate the new one with tabs and line breaks.
function formatXmlString($xml){
$xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);
$token = strtok($xml, "\n");
$result = '';
$pad = 0;
$matches = array();
while ($token !== false) :
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) :
$indent=0;
elseif (preg_match('/^<\/\w/', $token, $matches)) :
$pad--;
$indent = 0;
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) :
$indent=1;
else :
$indent = 0;
endif;
$line = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT);
$result .= $line . "\n";
$token = strtok("\n");
$pad += $indent;
endwhile;
return $result;
}
//Here is example using XMLWriter
$w = new XMLWriter;
$w->openMemory();
$w->setIndent(true);
$w->startElement('foo');
$w->startElement('bar');
$w->writeElement("key", "value");
$w->endElement();
$w->endElement();
echo $w->outputMemory();
//out put
<foo>
<bar>
<key>value</key>
</bar>
</foo>
The first is better if you plan to embed values into the XML, The second is better for humans to read. Neither is good if you intend really work with XML.
However if you intend to perform a simple fire and forget function that takes XML as a input parameter, then I would say use the first method because you will need to embed parameters at some point.
I personally would use the PHP class simplexml, it's very easy to use and it's built in xpath support makes detailing the data returned in XML a dream.
I am trying to convert my CSV file to XML. I am using this script below which I got from a post. But it's not working on my side. Any idea why I am getting this error?
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
ini_set('auto_detect_line_endings', true);
$inputFilename = 'test.csv';
$outputFilename = 'test.xml';
// Open csv to read
$inputFile = fopen($inputFilename, 'rt');
// Get the headers of the file
$headers = fgetcsv($inputFile);
// Create a new dom document with pretty formatting
$doc = new DomDocument();
$doc->formatOutput = true;
// Add a root node to the document
$root = $doc->createElement('rows');
$root = $doc->appendChild($root);
// Loop through each row creating a <row> node with the correct data
while (($row = fgetcsv($inputFile)) !== FALSE)
{
$container = $doc->createElement('row');
foreach($headers as $i => $header)
{
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
$root->appendChild($container);
}
$strxml = $doc->saveXML();
The error I am getting is here:
Uncaught exception 'DOMException' with message 'Invalid Character Error'
Here is my csv file:
name, email, test
john,john#foobar.com,blah
mary,mary#blah.com,something
jane,jan#something.com,blarg
bob,bob#test.com,asdfsfd
Your header row has spaces around the field names, spaces don't sit well with the names of XML elements. Simply trim() the field names...
$child = $doc->createElement(trim($header));
Here are the codes:
$doc = new DomDocument('1.0');
// create root node
$root = $doc->createElement('root');
$root = $doc->appendChild($root);
$signed_values = array('a' => 'eee', 'b' => 'sd', 'c' => 'df');
// process one row at a time
foreach ($signed_values as $key => $val) {
// add node for each row
$occ = $doc->createElement('error');
$occ = $root->appendChild($occ);
// add a child node for each field
foreach ($signed_values as $fieldname => $fieldvalue) {
$child = $doc->createElement($fieldname);
$child = $occ->appendChild($child);
$value = $doc->createTextNode($fieldvalue);
$value = $child->appendChild($value);
}
}
// get completed xml document
$xml_string = $doc->saveXML() ;
echo $xml_string;
If I print it in the browser I don't get nice XML structure like
<xml> \n tab <child> etc.
I just get
<xml><child>ee</child></xml>
And I want to be utf-8
How is this all possible to do?
You can try to do this:
...
// get completed xml document
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$xml_string = $doc->saveXML();
echo $xml_string;
You can make set these parameter right after you've created the DOMDocument as well:
$doc = new DomDocument('1.0');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
That's probably more concise. Output in both cases is (Demo):
<?xml version="1.0"?>
<root>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
</root>
I'm not aware how to change the indentation character(s) with DOMDocument. You could post-process the XML with a line-by-line regular-expression based replacing (e.g. with preg_replace):
$xml_string = preg_replace('/(?:^|\G) /um', "\t", $xml_string);
Alternatively, there is the tidy extension with tidy_repair_string which can pretty print XML data as well. It's possible to specify indentation levels with it, however tidy will never output tabs.
tidy_repair_string($xml_string, ['input-xml'=> 1, 'indent' => 1, 'wrap' => 0]);
With a SimpleXml object, you can simply
$domxml = new DOMDocument('1.0');
$domxml->preserveWhiteSpace = false;
$domxml->formatOutput = true;
/* #var $xml SimpleXMLElement */
$domxml->loadXML($xml->asXML());
$domxml->save($newfile);
$xml is your simplexml object
So then you simpleXml can be saved as a new file specified by $newfile
<?php
$xml = $argv[1];
$dom = new DOMDocument();
// Initial block (must before load xml string)
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// End initial block
$dom->loadXML($xml);
$out = $dom->saveXML();
print_R($out);
Tried all the answers but none worked. Maybe it's because I'm appending and removing childs before saving the XML.
After a lot of googling found this comment in the php documentation. I only had to reload the resulting XML to make it work.
$outXML = $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
// ##### IN SUMMARY #####
$xmlFilepath = 'test.xml';
echoFormattedXML($xmlFilepath);
/*
* echo xml in source format
*/
function echoFormattedXML($xmlFilepath) {
header('Content-Type: text/xml'); // to show source, not execute the xml
echo formatXML($xmlFilepath); // format the xml to make it readable
} // echoFormattedXML
/*
* format xml so it can be easily read but will use more disk space
*/
function formatXML($xmlFilepath) {
$loadxml = simplexml_load_file($xmlFilepath);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($loadxml->asXML());
$formatxml = new SimpleXMLElement($dom->saveXML());
//$formatxml->saveXML("testF.xml"); // save as file
return $formatxml->saveXML();
} // formatXML
Two different issues here:
Set the formatOutput and preserveWhiteSpace attributes to TRUE to generate formatted XML:
$doc->formatOutput = TRUE;
$doc->preserveWhiteSpace = TRUE;
Many web browsers (namely Internet Explorer and Firefox) format XML when they display it. Use either the View Source feature or a regular text editor to inspect the output.
See also xmlEncoding and encoding.
This is a slight variation of the above theme but I'm putting here in case others hit this and cannot make sense of it ...as I did.
When using saveXML(), preserveWhiteSpace in the target DOMdocument does not apply to imported nodes (as at PHP 5.6).
Consider the following code:
$dom = new DOMDocument(); //create a document
$dom->preserveWhiteSpace = false; //disable whitespace preservation
$dom->formatOutput = true; //pretty print output
$documentElement = $dom->createElement("Entry"); //create a node
$dom->appendChild ($documentElement); //append it
$message = new DOMDocument(); //create another document
$message->loadXML($messageXMLtext); //populate the new document from XML text
$node=$dom->importNode($message->documentElement,true); //import the new document content to a new node in the original document
$documentElement->appendChild($node); //append the new node to the document Element
$dom->saveXML($dom->documentElement); //print the original document
In this context, the $dom->saveXML(); statement will NOT pretty print the content imported from $message, but content originally in $dom will be pretty printed.
In order to achieve pretty printing for the entire $dom document, the line:
$message->preserveWhiteSpace = false;
must be included after the $message = new DOMDocument(); line - ie. the document/s from which the nodes are imported must also have preserveWhiteSpace = false.
based on the answer by #heavenevil
This function pretty prints using the browser
function prettyPrintXmlToBrowser(SimpleXMLElement $xml)
{
$domXml = new DOMDocument('1.0');
$domXml->preserveWhiteSpace = false;
$domXml->formatOutput = true;
$domXml->loadXML($xml->asXML());
$xmlString = $domXml->saveXML();
echo nl2br(str_replace(' ', ' ', htmlspecialchars($xmlString)));
}
I'm using the W3 validator API, and I get this kind of response:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator">
<m:uri>http://myurl.com/</m:uri>
<m:checkedby>http://validator.w3.org/</m:checkedby>
<m:doctype>-//W3C//DTD XHTML 1.1//EN</m:doctype>
<m:charset>utf-8</m:charset>
<m:validity>false</m:validity>
<m:errors>
<m:errorcount>1</m:errorcount>
<m:errorlist>
<m:error>
<m:line>7</m:line>
<m:col>80</m:col>
<m:message>character data is not allowed here</m:message>
<m:messageid>63</m:messageid>
<m:explanation> <![CDATA[
PAGE HTML IS HERE
]]>
</m:explanation>
<m:source><![CDATA[ HTML AGAIN ]]></m:source>
</m:error>
...
</m:errorlist>
</m:errors>
<m:warnings>
<m:warningcount>0</m:warningcount>
<m:warninglist>
</m:warninglist>
</m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>
How can I extract some variables from there?
I need validity, errorcount and if possible from the list of errors: line, col, and message :)
Is there a easy way to do this?
You can load the XML string into a SimpleXMLElement with simplexml_load_string and then find the attributes using XPath. It's important to register the namespaces involved with registerXPathNamespace before using XPath.
$xml = file_get_contents('example.xml'); // $xml should be the XML source string
$doc = simplexml_load_string($xml);
$doc->registerXPathNamespace('m', 'http://www.w3.org/2005/10/markup-validator');
$nodes = $doc->xpath('//m:markupvalidationresponse/m:validity');
$validity = strval($nodes[0]);
echo 'is valid: ', $validity, "\n";
$nodes = $doc->xpath('//m:markupvalidationresponse/m:errors/m:errorcount');
$errorcount = strval($nodes[0]);
echo 'total errors: ', $errorcount, "\n";
$nodes = $doc->xpath('//m:markupvalidationresponse/m:errors/m:errorlist/m:error');
foreach ($nodes as $node) {
$nodes = $node->xpath('m:line');
$line = strval($nodes[0]);
$nodes = $node->xpath('m:col');
$col = strval($nodes[0]);
$nodes = $node->xpath('m:message');
$message = strval($nodes[0]);
echo 'line: ', $line, ', column: ', $col, ' message: ', $message, "\n";
}
You should be using a SOAP library to get this in the first place. There are various options you can try for this; nusoap, http://php.net/manual/en/book.soap.php, the zend framework also has SOAP client and server which you can use. Whatever implementation you use will allow you to get the data in some way. Doing a var_dump() on whatever holds the initial response should aid you in navigating through it.
If you rather use the DOMDocument class from php. You don't have to know Xpath to get this working. An example:
$url = "http://www.google.com";
$xml = new DOMDocument();
$xml->load("http://validator.w3.org/check?uri=".urlencode($url)."&output=soap12");
$doctype = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'doctype')->item(0)->nodeValue;
$valid = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'validity')->item(0)->nodeValue;
$errorcount = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'errorcount')->item(0)->nodeValue;
$warningcount = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'warningcount')->item(0)->nodeValue;
$errors = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'error');
foreach ($errors as $error) {
echo "<br>line: ".$error->childNodes->item(1)->nodeValue;
echo "<br>col: ".$error->childNodes->item(3)->nodeValue;
echo "<br>message: ".$error->childNodes->item(5)->nodeValue;
}
// item() arguments are uneven because the empty text between tags is counted as an item.
What is the best way to format XML within a PHP class.
$xml = "<element attribute=\"something\">...</element>";
$xml = '<element attribute="something">...</element>';
$xml = '<element attribute=\'something\'>...</element>';
$xml = <<<EOF
<element attribute="something">
</element>
EOF;
I'm pretty sure it is the last one!
With DOM you can do
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML('<root><foo><bar>baz</bar></foo></root>');
$dom->formatOutput = TRUE;
echo $dom->saveXML();
gives (live demo)
<?xml version="1.0"?>
<root>
<foo>
<bar>baz</bar>
</foo>
</root>
See DOMDocument::formatOutput and DOMDocument::preserveWhiteSpace properties description.
This function works perfectlly as you want you don't have to use any xml dom library or nething just pass the xml generated string into it and it will parse and generate the new one with tabs and line breaks.
function formatXmlString($xml){
$xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);
$token = strtok($xml, "\n");
$result = '';
$pad = 0;
$matches = array();
while ($token !== false) :
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) :
$indent=0;
elseif (preg_match('/^<\/\w/', $token, $matches)) :
$pad--;
$indent = 0;
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) :
$indent=1;
else :
$indent = 0;
endif;
$line = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT);
$result .= $line . "\n";
$token = strtok("\n");
$pad += $indent;
endwhile;
return $result;
}
//Here is example using XMLWriter
$w = new XMLWriter;
$w->openMemory();
$w->setIndent(true);
$w->startElement('foo');
$w->startElement('bar');
$w->writeElement("key", "value");
$w->endElement();
$w->endElement();
echo $w->outputMemory();
//out put
<foo>
<bar>
<key>value</key>
</bar>
</foo>
The first is better if you plan to embed values into the XML, The second is better for humans to read. Neither is good if you intend really work with XML.
However if you intend to perform a simple fire and forget function that takes XML as a input parameter, then I would say use the first method because you will need to embed parameters at some point.
I personally would use the PHP class simplexml, it's very easy to use and it's built in xpath support makes detailing the data returned in XML a dream.