What is the best way to format XML within a PHP class.
$xml = "<element attribute=\"something\">...</element>";
$xml = '<element attribute="something">...</element>';
$xml = '<element attribute=\'something\'>...</element>';
$xml = <<<EOF
<element attribute="something">
</element>
EOF;
I'm pretty sure it is the last one!
With DOM you can do
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML('<root><foo><bar>baz</bar></foo></root>');
$dom->formatOutput = TRUE;
echo $dom->saveXML();
gives (live demo)
<?xml version="1.0"?>
<root>
<foo>
<bar>baz</bar>
</foo>
</root>
See DOMDocument::formatOutput and DOMDocument::preserveWhiteSpace properties description.
This function works perfectlly as you want you don't have to use any xml dom library or nething just pass the xml generated string into it and it will parse and generate the new one with tabs and line breaks.
function formatXmlString($xml){
$xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);
$token = strtok($xml, "\n");
$result = '';
$pad = 0;
$matches = array();
while ($token !== false) :
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) :
$indent=0;
elseif (preg_match('/^<\/\w/', $token, $matches)) :
$pad--;
$indent = 0;
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) :
$indent=1;
else :
$indent = 0;
endif;
$line = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT);
$result .= $line . "\n";
$token = strtok("\n");
$pad += $indent;
endwhile;
return $result;
}
//Here is example using XMLWriter
$w = new XMLWriter;
$w->openMemory();
$w->setIndent(true);
$w->startElement('foo');
$w->startElement('bar');
$w->writeElement("key", "value");
$w->endElement();
$w->endElement();
echo $w->outputMemory();
//out put
<foo>
<bar>
<key>value</key>
</bar>
</foo>
The first is better if you plan to embed values into the XML, The second is better for humans to read. Neither is good if you intend really work with XML.
However if you intend to perform a simple fire and forget function that takes XML as a input parameter, then I would say use the first method because you will need to embed parameters at some point.
I personally would use the PHP class simplexml, it's very easy to use and it's built in xpath support makes detailing the data returned in XML a dream.
Related
What is the best way to format XML within a PHP class.
$xml = "<element attribute=\"something\">...</element>";
$xml = '<element attribute="something">...</element>';
$xml = '<element attribute=\'something\'>...</element>';
$xml = <<<EOF
<element attribute="something">
</element>
EOF;
I'm pretty sure it is the last one!
With DOM you can do
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML('<root><foo><bar>baz</bar></foo></root>');
$dom->formatOutput = TRUE;
echo $dom->saveXML();
gives (live demo)
<?xml version="1.0"?>
<root>
<foo>
<bar>baz</bar>
</foo>
</root>
See DOMDocument::formatOutput and DOMDocument::preserveWhiteSpace properties description.
This function works perfectlly as you want you don't have to use any xml dom library or nething just pass the xml generated string into it and it will parse and generate the new one with tabs and line breaks.
function formatXmlString($xml){
$xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);
$token = strtok($xml, "\n");
$result = '';
$pad = 0;
$matches = array();
while ($token !== false) :
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) :
$indent=0;
elseif (preg_match('/^<\/\w/', $token, $matches)) :
$pad--;
$indent = 0;
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) :
$indent=1;
else :
$indent = 0;
endif;
$line = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT);
$result .= $line . "\n";
$token = strtok("\n");
$pad += $indent;
endwhile;
return $result;
}
//Here is example using XMLWriter
$w = new XMLWriter;
$w->openMemory();
$w->setIndent(true);
$w->startElement('foo');
$w->startElement('bar');
$w->writeElement("key", "value");
$w->endElement();
$w->endElement();
echo $w->outputMemory();
//out put
<foo>
<bar>
<key>value</key>
</bar>
</foo>
The first is better if you plan to embed values into the XML, The second is better for humans to read. Neither is good if you intend really work with XML.
However if you intend to perform a simple fire and forget function that takes XML as a input parameter, then I would say use the first method because you will need to embed parameters at some point.
I personally would use the PHP class simplexml, it's very easy to use and it's built in xpath support makes detailing the data returned in XML a dream.
When adding a string that might contain troublesome characters (eg &, <, >), DOMDocument throws a warning, rather than sanitizing the string.
I'm looking for a succinct way to make strings xml-safe - ideally something that leverages the DOMDocument library.
I'm looking for something better than preg_replace or htmlspecialchars. I see DOMDocument::createTextNode(), but the resulting DOMText object is cumbersome and can't be handed to DOMDocument::createElement().
To illustrate the problem, this code:
<?php
$dom = new DOMDocument;
$dom->formatOutput = true;
$parent = $dom->createElement('rootNode');
$parent->appendChild( $dom->createElement('name', 'this ampersand causes pain & sorrow ') );
$dom->appendChild( $parent );
echo $dom->saveXml();
produces this result (see eval.in):
Warning: DOMDocument::createElement(): unterminated entity reference sorrow in /tmp/execpad-41ee778d3376/source-41ee778d3376 on line 6
<?xml version="1.0"?>
<rootNode>
<name>this ampersand causes pain </name>
</rootNode>
You will have to create the text node and append it. I described the problem in this answer: https://stackoverflow.com/a/22957785/2265374
However you can extend DOMDocument and overload createElement*().
class MyDOMDocument extends DOMDocument {
public function createElement($name, $content = '') {
$node = parent::createElement($name);
if ((string)$content !== '') {
$node->appendChild($this->createTextNode($content));
}
return $node;
}
public function createElementNS($namespace, $name, $content = '') {
$node = parent::createElementNS($namespace, $name);
if ((string)$content !== '') {
$node->appendChild($this->createTextNode($content));
}
return $node;
}
}
$dom = new MyDOMDocument();
$root = $dom->appendChild($dom->createElement('foo'));
$root->appendChild($dom->createElement('bar', 'Company & Son'));
$root->appendChild($dom->createElementNS('urn:bar', 'bar', 'Company & Son'));
$dom->formatOutput = TRUE;
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<foo>
<bar>Company & Son</bar>
<bar xmlns="urn:bar">Company & Son</bar>
</foo>
This is the structure I use to build XML elements, the second part is usually wrapped in a function.
$parent = $document->documentElement; // pick the node we want to append to
$name = 'foo'; // new element name
$content = 'bar < not a tag > <![CDATA[" testing cdata "]]>'; // content
$element = ($parent->ownerDocument) ? $parent->ownerDocument->createElement($name) : $parent->createElement($name);
$parent->appendchild($element);
$element->appendchild($parent->ownerDocument->createTextNode($content));
my function will then return $element
I have a OFX file downloaded from Citibank, this file has a DTD defined at http://www.ofx.net/DownloadPage/Files/ofx102spec.zip (file OFXBANK.DTD), the OFX file appear to be SGML valid.
I'm trying with DomDocument of PHP 5.4.13, but I get several warning and file is not parsed. My Code is:
$file = "source/ACCT_013.OFX";
$dtd = "source/ofx102spec/OFXBANK.DTD";
$doc = new DomDocument();
$doc->loadHTMLFile($file);
$doc->schemaValidate($dtd);
$dom->validateOnParse = true;
The OFX file start as:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20130331073401
<LANGUAGE>SPA
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>COP
<BANKACCTFROM> ...
I'm open to install and use any program in Server (Centos) for call from PHP.
PD: This class http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html don't work for me.
Well first of all even XML is a subset of SGML a valid SGML file must not be a well-formed XML file. XML is more strict and does not use all features that SGML offers.
As DOMDocument is XML (and not SGML) based, this is not really compatible.
Next to that problem, please see 2.2 Open Financial Exchange Headers in Ofexfin1.doc it explains you that
The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header
and further on:
A blank line follows the last header. Then (for type OFXSGML), the SGML-readable data begins with the <OFX> tag.
So locate the first blank line and strip everyhing until there. Then load the SGML part into DOMDocument by converting the SGML into XML first:
$source = fopen('file.ofx', 'r');
if (!$source) {
throw new Exception('Unable to open OFX file.');
}
// skip headers of OFX file
$headers = array();
$charsets = array(
1252 => 'WINDOWS-1251',
);
while(!feof($source)) {
$line = trim(fgets($source));
if ($line === '') {
break;
}
list($header, $value) = explode(':', $line, 2);
$headers[$header] = $value;
}
$buffer = '';
// dead-cheap SGML to XML conversion
// see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx
while(!feof($source)) {
$line = trim(fgets($source));
if ($line === '') continue;
$line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line);
if (substr($line, -1, 1) !== '>') {
list($tag) = explode('>', $line, 2);
$line .= '</' . substr($tag, 1) . '>';
}
$buffer .= $line ."\n";
}
// use DOMDocument with non-standard recover mode
$doc = new DOMDocument();
$doc->recover = true;
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$save = libxml_use_internal_errors(true);
$doc->loadXML($buffer);
libxml_use_internal_errors($save);
echo $doc->saveXML();
This code-example then outputs the following (re-formatted) XML which also shows that DOMDocument loaded the data properly:
<?xml version="1.0"?>
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0</CODE>
<SEVERITY>INFO</SEVERITY>
</STATUS>
<DTSERVER>20130331073401</DTSERVER>
<LANGUAGE>SPA</LANGUAGE>
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0</TRNUID>
<STATUS>
<CODE>0</CODE>
<SEVERITY>INFO</SEVERITY>
</STATUS>
<STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>
I do not know whether or not this can be validated against the DTD then. Maybe this works. Additionally if the SGML is not written with the values that are of a tag on the same line (and only a single element on each line is required), then this fragile conversion will break.
Simplest OFX parse into an array with easy access to all values and transactions.
function parseOFX($ofx) {
$OFXArray=explode("<",$ofx);
$a=array();
foreach ($OFXArray as $v) {
$pair=explode(">",$v);
if (isset($pair[1])) {
if ($pair[1]!=NULL) {
if (isset($a[$pair[0]])) {
if (is_array($a[$pair[0]])) {
$a[$pair[0]][]=$pair[1];
} else {
$temp=$a[$pair[0]];
$a[$pair[0]]=array();
$a[$pair[0]][]=$temp;
$a[$pair[0]][]=$pair[1];
}
} else {
$a[$pair[0]]=$pair[1];
}
}
}
}
return $a;
}
i use this:
$source = utf8_encode(file_get_contents('a.ofx'));
//add end tag
$source = preg_replace('#^<([^>]+)>([^\r\n]+)\r?\n#mU', "<$1>$2</$1>\n", $source);
//skip header
$source = substr($source, strpos($source,'<OFX>'));
//convert to array
$xml = simplexml_load_string($source);
$array = json_decode(json_encode($xml),true);
print_r($array);
What I tried and what doesn't work:
Input:
$d = new DOMDocument();
$d->formatOutput = true;
// Out of my control:
$someEl = $d->createElementNS('http://example.com/a', 'a:some');
// Under my control:
$envelopeEl = $d->createElementNS('http://example.com/default',
'envelope');
$d->appendChild($envelopeEl);
$envelopeEl->appendChild($someEl);
echo $d->saveXML();
$someEl->prefix = null;
echo $d->saveXML();
Output is invalid XML after substitution:
<?xml version="1.0"?>
<envelope xmlns="http://example.com/default">
<a:some xmlns:a="http://example.com/a"/>
</envelope>
<?xml version="1.0"?>
<envelope xmlns="http://example.com/default">
<:some xmlns:a="http://example.com/a" xmlns:="http://example.com/a"/>
</envelope>
Note that <a:some> may have children. One solution would be
to create a new <some>, and copy all children from <a:some> to <some>. Is
that the way to go?
This is really an interesting question. My first intention was to clone the <a:some> node, remove the xmlns:a attribute, remove the <a:some> and insert the clone - <a>. But this will not work, as PHP does not allow to remove the xmlns:a attribute like any regular attribute.
After some struggling with DOM methods of PHP I started to google the problem. I found this comment in the PHP documentation on this. The user suggest to write a function that clones the node manually without it's namespace:
<?php
/**
* This function is based on a comment to the PHP documentation.
* See: http://www.php.net/manual/de/domnode.clonenode.php#90559
*/
function cloneNode($node, $doc){
$unprefixedName = preg_replace('/.*:/', '', $node->nodeName);
$nd = $doc->createElement($unprefixedName);
foreach ($node->attributes as $value)
$nd->setAttribute($value->nodeName, $value->value);
if (!$node->childNodes)
return $nd;
foreach($node->childNodes as $child) {
if($child->nodeName == "#text")
$nd->appendChild($doc->createTextNode($child->nodeValue));
else
$nd->appendChild(cloneNode($child, $doc));
}
return $nd;
}
Using it would lead to a code like this:
$xml = '<?xml version="1.0"?>
<envelope xmlns="http://example.com/default">
<a:some xmlns:a="http://example.com/a"/>
</envelope>';
$doc = new DOMDocument();
$doc->loadXML($xml);
$elements = $doc->getElementsByTagNameNS('http://example.com/a', 'some');
$original = $elements->item(0);
$clone = cloneNode($original, $doc);
$doc->documentElement->replaceChild($clone, $original);
$doc->formatOutput = TRUE;
echo $doc->saveXML();
I'm using the W3 validator API, and I get this kind of response:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator">
<m:uri>http://myurl.com/</m:uri>
<m:checkedby>http://validator.w3.org/</m:checkedby>
<m:doctype>-//W3C//DTD XHTML 1.1//EN</m:doctype>
<m:charset>utf-8</m:charset>
<m:validity>false</m:validity>
<m:errors>
<m:errorcount>1</m:errorcount>
<m:errorlist>
<m:error>
<m:line>7</m:line>
<m:col>80</m:col>
<m:message>character data is not allowed here</m:message>
<m:messageid>63</m:messageid>
<m:explanation> <![CDATA[
PAGE HTML IS HERE
]]>
</m:explanation>
<m:source><![CDATA[ HTML AGAIN ]]></m:source>
</m:error>
...
</m:errorlist>
</m:errors>
<m:warnings>
<m:warningcount>0</m:warningcount>
<m:warninglist>
</m:warninglist>
</m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>
How can I extract some variables from there?
I need validity, errorcount and if possible from the list of errors: line, col, and message :)
Is there a easy way to do this?
You can load the XML string into a SimpleXMLElement with simplexml_load_string and then find the attributes using XPath. It's important to register the namespaces involved with registerXPathNamespace before using XPath.
$xml = file_get_contents('example.xml'); // $xml should be the XML source string
$doc = simplexml_load_string($xml);
$doc->registerXPathNamespace('m', 'http://www.w3.org/2005/10/markup-validator');
$nodes = $doc->xpath('//m:markupvalidationresponse/m:validity');
$validity = strval($nodes[0]);
echo 'is valid: ', $validity, "\n";
$nodes = $doc->xpath('//m:markupvalidationresponse/m:errors/m:errorcount');
$errorcount = strval($nodes[0]);
echo 'total errors: ', $errorcount, "\n";
$nodes = $doc->xpath('//m:markupvalidationresponse/m:errors/m:errorlist/m:error');
foreach ($nodes as $node) {
$nodes = $node->xpath('m:line');
$line = strval($nodes[0]);
$nodes = $node->xpath('m:col');
$col = strval($nodes[0]);
$nodes = $node->xpath('m:message');
$message = strval($nodes[0]);
echo 'line: ', $line, ', column: ', $col, ' message: ', $message, "\n";
}
You should be using a SOAP library to get this in the first place. There are various options you can try for this; nusoap, http://php.net/manual/en/book.soap.php, the zend framework also has SOAP client and server which you can use. Whatever implementation you use will allow you to get the data in some way. Doing a var_dump() on whatever holds the initial response should aid you in navigating through it.
If you rather use the DOMDocument class from php. You don't have to know Xpath to get this working. An example:
$url = "http://www.google.com";
$xml = new DOMDocument();
$xml->load("http://validator.w3.org/check?uri=".urlencode($url)."&output=soap12");
$doctype = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'doctype')->item(0)->nodeValue;
$valid = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'validity')->item(0)->nodeValue;
$errorcount = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'errorcount')->item(0)->nodeValue;
$warningcount = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'warningcount')->item(0)->nodeValue;
$errors = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'error');
foreach ($errors as $error) {
echo "<br>line: ".$error->childNodes->item(1)->nodeValue;
echo "<br>col: ".$error->childNodes->item(3)->nodeValue;
echo "<br>message: ".$error->childNodes->item(5)->nodeValue;
}
// item() arguments are uneven because the empty text between tags is counted as an item.