I finally got my xml-(pre-)parsing-script ready. It parses and checks each element, and saves a amount of elements to a new xml-file.
My "problem" is that in the new xml-file all is written unformatted and without linebreaks.
while ($xml = $chunk->read()) {
$obj = simplexml_load_string($xml);
// check if ID is in Array
if(!in_array((string)$obj->id, $ids)) {
$chunkCount++;
$xmlData .= '<product>';
foreach($obj as $key => $value) {
$xmlData .= '<'.$key.'>'.$value.'</'.$key.'>';
}
$xmlData .= '</product>\n';
// if $chunkLimit is reached, save to file
if($chunkCount == $chunkLimit) {
$xp = fopen($file = "slices/slice_".$sliceCount.".xml", "w");
fwrite($xp, '<?xml version="1.0" ?>'."");
fwrite($xp, "<item>");
fwrite($xp, $xmlData);
fwrite($xp, "</item>");
fclose($xp);
print "Written ".$file."<br>";
$xmlData = '';
$chunkCount = 0;
$sliceCount++;
}
}
}
How could I get my xml-slices look good, with linebreaks? .. I already tried \nbut it simply writes \n to the new file.
The trick is to use " instead of ' for special characters to be parsed as special.
so
$xmlData .= '</product>\n';
should be
$xmlData .= "</product>\n";
You can also use \t for tabs, if you want indentation!
Use special chars \n and \t for tabulations.
So:
fwrite($xp, "<item>\n");
fwrite($xp, "\t" . $xmlData."\n");
fwrite($xp, "</item>\n");
Related
I am developing a webservice using Nusoap to send xml data, and this time I have to send data with latin characters like 'ó'. However when I put it in the soap client it stops working. Below is a summary of code being develped to test sending xml with latin characters.
This is ths summary of server code being developed:
include_once("nusoap/nusoap.php");
$server = new soap_server();
$server->configureWSDL("PersonImport","urn:PersonImport");
$server->register("PersonImport",array("login" => "xsd:string", 'senha' => 'xsd:string', 'fornecedor' => 'xsd:string'),array("return" => "xsd:string"),"urn:PersonImport","urn:PersonImport#PersonImport");
function PersonImport($login,$senha,$fornecedor) {
//Just for debug purposes
$return = "My login Is <b>".$login . "</b> And My senha Is <b>".$senha."</b> And My fornecedor Is <b>".$fornecedor."</b>.";
(...)(ommited code, xml parsing and response xml generation)
return $return;
}
This is ths summary of client code:
<?php
require_once("nusoap/nusoap.php");
$client = new soapclient("example.com?wsdl");
$xml = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>";
$xml .= "<fornecedor>";
$xml .= "<NAME1>Gian</NAME1>";
$xml .= "<MCOD1>Giancarlo SA</MCOD1>";
$xml .= "<STCD1>80048303000113</STCD1>";
$xml .= "<STCD2>55670185501</STCD2>";
$xml .= "<STCD3>5508150087</STCD3>";
$xml .= "<RG>359730553</RG>";
$xml .= "<STRAS>rua itororó</STRAS>";
$xml .= "<HOUSE_NUM1>81</HOUSE_NUM1>";
$xml .= "<HOUSE_NUM2>301</HOUSE_NUM2>";
$xml .= "<ORT02>Menino Deus</ORT02>";
$xml .= "<PSTLZ>90110290</PSTLZ>";
$xml .= "<REGIO>RS</REGIO>";
$xml .= "<ORT01>Porto Alegre</ORT01>";
$xml .= "<TELF1>32335675</TELF1>";
$xml .= "<TELFX>32335675</TELFX>";
$xml .= "<SMTP_ADDR>teste#teste.com</SMTP_ADDR>";
$xml .= "<ERDAT>2016-10-04</ERDAT>";
$xml .= "<ChangeData>2016-10-04</ChangeData>";
$xml .= "<StartData>2016-10-04</StartData>";
$xml .= "<OffData>2016-10-04</OffData>";
$xml .= "</fornecedor>";
$result = $client->PersonImport("login","password", $xml);
echo $result;
The line
$xml .= "<STRAS>rua itororó</STRAS>";
has a special character. If I remove the 'ó' character it works.
I tried to set encoding on xml:
$xml = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>";
This worked for me when I had to parse xml with SimpleXML Parser, but it didn't work on soap.
I tried to set header of the page for utf8 or ISO-8859-1 like this:
header("Content-type:text/html; charset=UTF-8");
or:
header ('Content-type: text/html; charset=ISO-8859-1');
I tried to use htmlentities, but the entity for 'ó' is '& o a c u t e;' which has the special character '&' and then the same problem happens.
function serialize didn't resolve the problem.
I couldn't find an answer until now on google.
Is it possible to pass latin special characters using nusoap? There must be a way.
Looks like I found a way. Using CDATA I can escape the '&', so I can use htmlentities on strings, like this:
$xml .= "<STRAS><![CDATA[".htmlentities("Rua Itororó")."]]></STRAS>";
I have a OFX file downloaded from Citibank, this file has a DTD defined at http://www.ofx.net/DownloadPage/Files/ofx102spec.zip (file OFXBANK.DTD), the OFX file appear to be SGML valid.
I'm trying with DomDocument of PHP 5.4.13, but I get several warning and file is not parsed. My Code is:
$file = "source/ACCT_013.OFX";
$dtd = "source/ofx102spec/OFXBANK.DTD";
$doc = new DomDocument();
$doc->loadHTMLFile($file);
$doc->schemaValidate($dtd);
$dom->validateOnParse = true;
The OFX file start as:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20130331073401
<LANGUAGE>SPA
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>COP
<BANKACCTFROM> ...
I'm open to install and use any program in Server (Centos) for call from PHP.
PD: This class http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html don't work for me.
Well first of all even XML is a subset of SGML a valid SGML file must not be a well-formed XML file. XML is more strict and does not use all features that SGML offers.
As DOMDocument is XML (and not SGML) based, this is not really compatible.
Next to that problem, please see 2.2 Open Financial Exchange Headers in Ofexfin1.doc it explains you that
The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header
and further on:
A blank line follows the last header. Then (for type OFXSGML), the SGML-readable data begins with the <OFX> tag.
So locate the first blank line and strip everyhing until there. Then load the SGML part into DOMDocument by converting the SGML into XML first:
$source = fopen('file.ofx', 'r');
if (!$source) {
throw new Exception('Unable to open OFX file.');
}
// skip headers of OFX file
$headers = array();
$charsets = array(
1252 => 'WINDOWS-1251',
);
while(!feof($source)) {
$line = trim(fgets($source));
if ($line === '') {
break;
}
list($header, $value) = explode(':', $line, 2);
$headers[$header] = $value;
}
$buffer = '';
// dead-cheap SGML to XML conversion
// see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx
while(!feof($source)) {
$line = trim(fgets($source));
if ($line === '') continue;
$line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line);
if (substr($line, -1, 1) !== '>') {
list($tag) = explode('>', $line, 2);
$line .= '</' . substr($tag, 1) . '>';
}
$buffer .= $line ."\n";
}
// use DOMDocument with non-standard recover mode
$doc = new DOMDocument();
$doc->recover = true;
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$save = libxml_use_internal_errors(true);
$doc->loadXML($buffer);
libxml_use_internal_errors($save);
echo $doc->saveXML();
This code-example then outputs the following (re-formatted) XML which also shows that DOMDocument loaded the data properly:
<?xml version="1.0"?>
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0</CODE>
<SEVERITY>INFO</SEVERITY>
</STATUS>
<DTSERVER>20130331073401</DTSERVER>
<LANGUAGE>SPA</LANGUAGE>
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0</TRNUID>
<STATUS>
<CODE>0</CODE>
<SEVERITY>INFO</SEVERITY>
</STATUS>
<STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>
I do not know whether or not this can be validated against the DTD then. Maybe this works. Additionally if the SGML is not written with the values that are of a tag on the same line (and only a single element on each line is required), then this fragile conversion will break.
Simplest OFX parse into an array with easy access to all values and transactions.
function parseOFX($ofx) {
$OFXArray=explode("<",$ofx);
$a=array();
foreach ($OFXArray as $v) {
$pair=explode(">",$v);
if (isset($pair[1])) {
if ($pair[1]!=NULL) {
if (isset($a[$pair[0]])) {
if (is_array($a[$pair[0]])) {
$a[$pair[0]][]=$pair[1];
} else {
$temp=$a[$pair[0]];
$a[$pair[0]]=array();
$a[$pair[0]][]=$temp;
$a[$pair[0]][]=$pair[1];
}
} else {
$a[$pair[0]]=$pair[1];
}
}
}
}
return $a;
}
i use this:
$source = utf8_encode(file_get_contents('a.ofx'));
//add end tag
$source = preg_replace('#^<([^>]+)>([^\r\n]+)\r?\n#mU', "<$1>$2</$1>\n", $source);
//skip header
$source = substr($source, strpos($source,'<OFX>'));
//convert to array
$xml = simplexml_load_string($source);
$array = json_decode(json_encode($xml),true);
print_r($array);
I am writing a new record to xml file with to item , A English word and a Hebrew word.
But the line $newWord->appendChild($prop.$new_line); causes this Error
"Object of class DOMElement could not be converted to string"
the parameter $new_line equal to $new_line = "\n";.
what I miss here thx ?
my code are:
<?php
/*$wordH=$_GET['varHeb'];
$wordE=$_GET['varEng'];*/
$wordH="newhebWord";
$wordE="newengWord";
$new_line = "\n";
$doc='';
if(!$doc)
{
$doc = new DOMDocument();
// we want a nice output
$doc->formatOutput = true;
$doc->load('Dictionary_user.xml');
}
$Dictionary_user = $doc->documentElement;
$newWord = $doc->createElement('newWord');
$prop = $doc->createElement('Heb', $wordH);
$newWord->appendChild($prop.$new_line);
$prop = $doc->createElement('Eng',$wordE);
$newWord->appendChild($prop.$new_line);
$Dictionary_user->childNodes->item(0)->parentNode->insertBefore($newWord,$Dictionary_user->childNodes->item(0));
header("Content-type: text/xml");
$doc->save("Dictionary_user.xml");
echo $doc->saveXML();
?>
You don't need to append a newline, you are dealing with a real data structure (A DOMDocument) not a string.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
format xml string
I'm generating an XML page like so:
header('Content-Type: text/html');
$xmlpage = '<?xml version="1.0" charset="utf-8"?>';
$xmlpage .= '<conv>';
$xmlpage .= '<at>6 January 2012 12:00</at>';
$xmlpage .= '<rate>1.56317</rate>';
$xmlpage .= '<from>';
$xmlpage .= '<code>'.$from.'</code>';
$xmlpage .= '<curr>Pound Sterling</curr>';
$xmlpage .= '<loc>UK</loc>';
$xmlpage .= '<amnt>'.$amnt.'</amnt>';
$xmlpage .= '</from>';
$xmlpage .= '</conv>';
echo $xmlpage;
When viewing the page source, it looks terrible:
<?xml version="1.0" charset="utf-8"?><conv><at>6 January 2012 12:00</at><rate>1.56317</rate><from><code>USD</code><curr>Pound Sterling</curr><loc>UK</loc><amnt>23</amnt></from><to><code>GBP</code><curr>United States Dollar</curr><loc>USA</loc><amnt>14.73</amnt></to></conv>
How can I make this so it's properly formatted and indented?
Add newlines with the \r\n or only \n characters. You'll need to place your strings in double quotes ("") for it to work, so either replace the double-quotes inside the strings with single ones ('), escape the double quotes (\"), add ."\r\n" as a linebreak or use HEREDOC.
Building your XML with a XML generator like the built-in SimpleXML will prevent these sort and numerous other types of problems and is usually far easier than building it by hand with strings.
You could:
Do it yourself by adding whitespace characters to your strings (\n, \t).
Output all your XML with a HEREDOC
You could create or even generate a DOMDocument and use saveXML()
The first two are quick and dirty (heredoc's better). The latter is more robust, but more code.
Use a HEREDOC. it'll be far easier to read than repeated string concatenation, allows tabs/multilines, and does variable interpolation for you:
$xmlpage = <<<EOL
<?xml version="1.0" charset="utf-8"?>
<conv>
<at>6 January 2012 12:00</at>
<rate>1.56317</rate>
<from>
<code>$from</code>
<curr>Pound Sterling</curr>
<loc>UK</loc>
<amnt>$amnt</amnt>
</from>
</conv>
EOL;
Use a stylesheet and an XML viewer to view it.
add a \n after every $xmlpage. You should be able to view it properly after the echo.
e.g.
$xmlpage = "<?xml version="1.0" charset="utf-8"?>\n";
$xmlpage .= "<conv>\n";
$xmlpage .= "<at>6 January 2012 12:00</at>\n";
$xmlpage .= "<rate>1.56317</rate>\n";
The simplest way would be to add the appropriate whitespace to the beginning of the strings, and the newlines to the ends.
$xmlpage = '<?xml version="1.0" charset="utf-8"?>';
$xmlpage .= '<conv>' . "\n";
$xmlpage .= "\t" . '<at>6 January 2012 12:00</at>' . "\n";
$xmlpage .= "\t" . '<rate>1.56317</rate>' . "\n";
$xmlpage .= '<from>' . "\n";
$xmlpage .= "\t" . '<code>'.$from.'</code>' . "\n";
$xmlpage .= "\t" . '<curr>Pound Sterling</curr>' . "\n";
$xmlpage .= "\t" . '<loc>UK</loc>' . "\n";
$xmlpage .= "\t" . '<amnt>'.$amnt.'</amnt>' . "\n";
$xmlpage .= '</from>' . "\n";
$xmlpage .= '</conv>';
Or something along those lines, depending on your desired output.
Here's my prettify function, which formats for output. You can modify it to suit your needs.
function prettifyXML( $xml )
{
// Break our XML up into sections of newlines.
$xml = preg_replace( '/(<[^\/][^>]*?[^\/]>)/', "\n" . '\1', $xml );
$xml = preg_replace( '/(<\/[^\/>]*>|<[^\/>]*?\/>)/', '\1' . "\n", $xml );
$xml = str_replace( "\n\n", "\n", $xml );
$xml_chunks = explode( "\n", $xml );
$indent_depth = 0;
$open_tag_regex = '/<[^\/\?][^>]*>/';
$close_tag_regex = '/(<\/[^>]*>|<[^>]*\/>)/';
// Fix the indenting.
foreach ( $xml_chunks as $index => $xml_chunk )
{
$close_tag_count = preg_match( $close_tag_regex, $xml_chunk );
$open_tag_count = preg_match( $open_tag_regex, $xml_chunk );
if ( $open_tag_count >= $close_tag_count )
{
$temp_indent_depth = $indent_depth;
}
else
{
$temp_indent_depth = $indent_depth - $close_tag_count;
}
$xml_chunks[ $index ] = str_repeat( "\t", $temp_indent_depth ) . $xml_chunk;
$indent_depth += $open_tag_count - $close_tag_count;
}
$xml = implode( "\n", $xml_chunks );
// Add tokens for attributes and values.
$attribute_regex = '/([\w:]+\="[^"]*")/';
$value_regex = '/>([^<]*)</';
$value_span_token = '########';
$attribute_span_token = '########';
$span_close_token = '########';
$xml = preg_replace( $value_regex, '>' . $value_span_token . '\1' . $span_close_token . '<', $xml );
$xml = preg_replace( $attribute_regex, $attribute_span_token . '\1' .$span_close_token, $xml );
$xml = htmlentities( $xml );
// Replace the tokens that we added previously with their HTML counterparts.
$xml = str_replace( $value_span_token, '<span class="value">', $xml );
$xml = str_replace( $attribute_span_token, '<span class="attribute">', $xml );
$xml = str_replace( $span_close_token, '</span>', $xml );
return $xml;
}
It's been relatively well tested to handle edge cases, though it's not highly efficient because it's only for viewing logs.
I'm using the W3 validator API, and I get this kind of response:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse env:encodingStyle="http://www.w3.org/2003/05/soap-encoding" xmlns:m="http://www.w3.org/2005/10/markup-validator">
<m:uri>http://myurl.com/</m:uri>
<m:checkedby>http://validator.w3.org/</m:checkedby>
<m:doctype>-//W3C//DTD XHTML 1.1//EN</m:doctype>
<m:charset>utf-8</m:charset>
<m:validity>false</m:validity>
<m:errors>
<m:errorcount>1</m:errorcount>
<m:errorlist>
<m:error>
<m:line>7</m:line>
<m:col>80</m:col>
<m:message>character data is not allowed here</m:message>
<m:messageid>63</m:messageid>
<m:explanation> <![CDATA[
PAGE HTML IS HERE
]]>
</m:explanation>
<m:source><![CDATA[ HTML AGAIN ]]></m:source>
</m:error>
...
</m:errorlist>
</m:errors>
<m:warnings>
<m:warningcount>0</m:warningcount>
<m:warninglist>
</m:warninglist>
</m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>
How can I extract some variables from there?
I need validity, errorcount and if possible from the list of errors: line, col, and message :)
Is there a easy way to do this?
You can load the XML string into a SimpleXMLElement with simplexml_load_string and then find the attributes using XPath. It's important to register the namespaces involved with registerXPathNamespace before using XPath.
$xml = file_get_contents('example.xml'); // $xml should be the XML source string
$doc = simplexml_load_string($xml);
$doc->registerXPathNamespace('m', 'http://www.w3.org/2005/10/markup-validator');
$nodes = $doc->xpath('//m:markupvalidationresponse/m:validity');
$validity = strval($nodes[0]);
echo 'is valid: ', $validity, "\n";
$nodes = $doc->xpath('//m:markupvalidationresponse/m:errors/m:errorcount');
$errorcount = strval($nodes[0]);
echo 'total errors: ', $errorcount, "\n";
$nodes = $doc->xpath('//m:markupvalidationresponse/m:errors/m:errorlist/m:error');
foreach ($nodes as $node) {
$nodes = $node->xpath('m:line');
$line = strval($nodes[0]);
$nodes = $node->xpath('m:col');
$col = strval($nodes[0]);
$nodes = $node->xpath('m:message');
$message = strval($nodes[0]);
echo 'line: ', $line, ', column: ', $col, ' message: ', $message, "\n";
}
You should be using a SOAP library to get this in the first place. There are various options you can try for this; nusoap, http://php.net/manual/en/book.soap.php, the zend framework also has SOAP client and server which you can use. Whatever implementation you use will allow you to get the data in some way. Doing a var_dump() on whatever holds the initial response should aid you in navigating through it.
If you rather use the DOMDocument class from php. You don't have to know Xpath to get this working. An example:
$url = "http://www.google.com";
$xml = new DOMDocument();
$xml->load("http://validator.w3.org/check?uri=".urlencode($url)."&output=soap12");
$doctype = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'doctype')->item(0)->nodeValue;
$valid = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'validity')->item(0)->nodeValue;
$errorcount = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'errorcount')->item(0)->nodeValue;
$warningcount = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'warningcount')->item(0)->nodeValue;
$errors = $xml->getElementsByTagNameNS('http://www.w3.org/2005/10/markup-validator', 'error');
foreach ($errors as $error) {
echo "<br>line: ".$error->childNodes->item(1)->nodeValue;
echo "<br>col: ".$error->childNodes->item(3)->nodeValue;
echo "<br>message: ".$error->childNodes->item(5)->nodeValue;
}
// item() arguments are uneven because the empty text between tags is counted as an item.