Create a XML document using the DOM object with white characters - php

I have a question: I am trying to create a XML file using DomDocument and I would like to have this output:
<?xml version="1.0" encoding="UTF-8"?>
<winstrom version="1.0">
<main_tag>
<child_tag>example</child_tag>
</main_tag>
<winstrom>
The problem is with the second row - if I write it as below then the output is "Invalid Character Error". I guess it is not allowed to have space characters there... However I need it like this, so what are the options?
$dom = new DomDocument('1.0', 'UTF-8');
$root = $dom->createElement('winstrom version=1.0');
$dom->appendChild($root);
$item = $dom->createElement('hlavni_tag');
$root2->appendChild($item);
$text = $dom->createTextNode('example');
$item->appendChild($text);
$dom->formatOutput = true;
echo $dom->saveXML();

There seems to be a misunderstanding of what an XML element is and how it differs from attributes.
Try this code:
<?php
$dom = new DomDocument('1.0', 'UTF-8');
$root = $dom->createElement('winstrom');
$root->setAttribute("version","1.0");
$dom->appendChild($root);
$root2 = $dom->createElement("main_tag"); //You forgot this part
$root->appendChild($root2);
$item = $dom->createElement('hlavni_tag'); //Should it be "child_tag"?
$root2->appendChild($item);
$text = $dom->createTextNode('example');
$item->appendChild($text);
$dom->formatOutput = true;
echo $dom->saveXML();

Related

Duplicate xml namespace declarations php DOMDocument

I use PHP DOMDocument to generate xml. Sometimes namespaces are declared only on root element, which is intended behaviour, but sometimes no.
For example:
$xml = new DOMDocument('1.0', 'utf-8');
$ns = "http://ns.com";
$otherNs = "http://otherns.com";
$docs = $xml->createElementNS($ns, "ns:Documents");
$doc = $xml->createElementNS($otherNs, "ons:Document");
$innerElement = $xml->createElementNS($otherNs, "ons:innerElement", "someValue");
$doc->appendChild($innerElement);
$docs->appendChild($doc);
$xml->appendChild($docs);
$xml->formatOutput = true;
$xml->save("dom");
I expect:
<?xml version="1.0" encoding="UTF-8"?>
<ns:Documents xmlns:ns="http://ns.com" xmlns:ons="http://otherns.com">
<ons:Document>
<ons:innerElement>someValue</ons:innerElement>
</ons:Document>
</ns:Documents>
But got:
<?xml version="1.0" encoding="UTF-8"?>
<ns:Documents xmlns:ns="http://ns.com" xmlns:ons="http://otherns.com">
<ons:Document xmlns:ons="http://otherns.com">
<ons:innerElement>someValue</ons:innerElement>
</ons:Document>
</ns:Documents>
Why declaration of xmlns:ons="http://otherns.com" appears on Document element, but not in <innerElement>? And how to prevent duplicates?
It's very easy. Just add your nodes into document tree.
Additionally you can explicitly create xmlns:XXX atrtribute in root node.
See example:
namespace test;
use DOMDocument;
$xml = new DOMDocument("1.0", "UTF-8");
$ns = "http://ns.com";
$otherNs = "http://otherns.com";
$docs = $xml->createElementNS($ns, "ns:Documents");
$xml->appendChild($docs);
$docs->setAttributeNS('http://www.w3.org/2000/xmlns/', 'xmlns:ons', $otherNs);
$doc = $xml->createElement("ons:Document");
$docs->appendChild($doc);
$innerElement = $xml->createElement("ons:innerElement", "someValue");
$doc->appendChild($innerElement);
$xml->formatOutput = true;
echo $xml->saveXML();
Result:
<?xml version="1.0" encoding="UTF-8"?>
<ns:Documents xmlns:ns="http://ns.com" xmlns:ons="http://otherns.com">
<ons:Document>
<ons:innerElement>someValue</ons:innerElement>
</ons:Document>
</ns:Documents>

save XML file in minified form using DOMDocument

I want to save space by saving the xml file in minified form
for example
<body>
<div>
<p>hello</p>
<div/>
</div>
</body>
it should be saved like this
<body><div><p>hello</p><div/></div></body>
I'm using DOMDocument to create xml file like this
$xml = new DOMDocument("1.0", "UTF-8");
$xml->preserveWhiteSpace = false;
$xml->formatOutput = false;
$feed = $xml->createElement("feed");
$feed = $xml->appendChild($feed);
/*add attribute*/
$feed_attribute = $xml->createAttribute('xmlns:xsi');
$feed_attribute->value = 'http://www.w3.org/2001/XMLSchema-instance';
$feed->appendChild($feed_attribute);
$aggregator = $xml->createElement("aggregator");
$aggregator = $feed->appendChild($aggregator);
$name = $xml->createElement('name', 'test.com');
$aggregator->appendChild($name);
...etc
$xml->save(public_path() .$string, LIBXML_NOEMPTYTAG);
You're already using the right options. DOMDocument::$formatOutput and DOMDocument::$preserveWhiteSpace:
Format Output
DOMDocument::$formatOutput adds indentation whitespace nodes to an XML DOM if saved. (It is disabled by default.)
$document = new DOMDocument();
$body = $document->appendChild($document->createElement('body'));
$div = $body->appendChild($document->createElement('div'));
$div
->appendChild($document->createElement('p'))
->appendChild($document->createTextNode('hello'));
echo "Not Formatted:\n", $document->saveXML();
$document->formatOutput = TRUE;
echo "\nFormatted:\n", $document->saveXML();
Output:
Not Formatted:
<?xml version="1.0"?>
<body><div><p>hello</p></div></body>
Formatted:
<?xml version="1.0"?>
<body>
<div>
<p>hello</p>
</div>
</body>
However it does not indent if here are text child nodes. It tries to avoid changes to the text output of an HTML/XML document. So it will usually not reformat a loaded document with existing indention whitespace nodes.
Preserve White Space
DOMDocument::$preserveWhiteSpace is an option for the parser. If disabled (It is enabled by default) the parser will ignore any text nodes that would consists of only whitespaces. Indentations are text nodes with a linebreak and some spaces or tabs. It can be used to remove indentations from an XML.
$xml = <<<'XML'
<?xml version="1.0"?>
<body>
<div>
<p>hello</p>
</div>
</body>
XML;
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->loadXML($xml);
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<body><div><p>hello</p></div></body>
Try this, you have to use saveXML() instead of save(),
<?php
$xml = new DOMDocument('1.0');
$xml->preserveWhiteSpace = false;
$xml->formatOutput = false;
$root = $xml->createElement('book');
$root = $xml->appendChild($root);
$title = $xml->createElement('title');
$title = $root->appendChild($title);
$text = $xml->createTextNode("This is the \n title");
$text = $title->appendChild($text);
echo "Saving all the document:\n";
$xml_content = $xml->saveXML();
echo $xml_content . "\n";
$xml_content = str_replace(array(">\n", ">\t"), '>', trim($xml_content, "\n"));
echo $xml_content . "\n";
// Write the contents back to the file
$filename = "/tmp/xml_minified.xml";
file_put_contents($filename, $xml_content);
?>

domDocument's formatOutput property writes inline [duplicate]

Here are the codes:
$doc = new DomDocument('1.0');
// create root node
$root = $doc->createElement('root');
$root = $doc->appendChild($root);
$signed_values = array('a' => 'eee', 'b' => 'sd', 'c' => 'df');
// process one row at a time
foreach ($signed_values as $key => $val) {
// add node for each row
$occ = $doc->createElement('error');
$occ = $root->appendChild($occ);
// add a child node for each field
foreach ($signed_values as $fieldname => $fieldvalue) {
$child = $doc->createElement($fieldname);
$child = $occ->appendChild($child);
$value = $doc->createTextNode($fieldvalue);
$value = $child->appendChild($value);
}
}
// get completed xml document
$xml_string = $doc->saveXML() ;
echo $xml_string;
If I print it in the browser I don't get nice XML structure like
<xml> \n tab <child> etc.
I just get
<xml><child>ee</child></xml>
And I want to be utf-8
How is this all possible to do?
You can try to do this:
...
// get completed xml document
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$xml_string = $doc->saveXML();
echo $xml_string;
You can make set these parameter right after you've created the DOMDocument as well:
$doc = new DomDocument('1.0');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
That's probably more concise. Output in both cases is (Demo):
<?xml version="1.0"?>
<root>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
</root>
I'm not aware how to change the indentation character(s) with DOMDocument. You could post-process the XML with a line-by-line regular-expression based replacing (e.g. with preg_replace):
$xml_string = preg_replace('/(?:^|\G) /um', "\t", $xml_string);
Alternatively, there is the tidy extension with tidy_repair_string which can pretty print XML data as well. It's possible to specify indentation levels with it, however tidy will never output tabs.
tidy_repair_string($xml_string, ['input-xml'=> 1, 'indent' => 1, 'wrap' => 0]);
With a SimpleXml object, you can simply
$domxml = new DOMDocument('1.0');
$domxml->preserveWhiteSpace = false;
$domxml->formatOutput = true;
/* #var $xml SimpleXMLElement */
$domxml->loadXML($xml->asXML());
$domxml->save($newfile);
$xml is your simplexml object
So then you simpleXml can be saved as a new file specified by $newfile
<?php
$xml = $argv[1];
$dom = new DOMDocument();
// Initial block (must before load xml string)
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// End initial block
$dom->loadXML($xml);
$out = $dom->saveXML();
print_R($out);
Tried all the answers but none worked. Maybe it's because I'm appending and removing childs before saving the XML.
After a lot of googling found this comment in the php documentation. I only had to reload the resulting XML to make it work.
$outXML = $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
// ##### IN SUMMARY #####
$xmlFilepath = 'test.xml';
echoFormattedXML($xmlFilepath);
/*
* echo xml in source format
*/
function echoFormattedXML($xmlFilepath) {
header('Content-Type: text/xml'); // to show source, not execute the xml
echo formatXML($xmlFilepath); // format the xml to make it readable
} // echoFormattedXML
/*
* format xml so it can be easily read but will use more disk space
*/
function formatXML($xmlFilepath) {
$loadxml = simplexml_load_file($xmlFilepath);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($loadxml->asXML());
$formatxml = new SimpleXMLElement($dom->saveXML());
//$formatxml->saveXML("testF.xml"); // save as file
return $formatxml->saveXML();
} // formatXML
Two different issues here:
Set the formatOutput and preserveWhiteSpace attributes to TRUE to generate formatted XML:
$doc->formatOutput = TRUE;
$doc->preserveWhiteSpace = TRUE;
Many web browsers (namely Internet Explorer and Firefox) format XML when they display it. Use either the View Source feature or a regular text editor to inspect the output.
See also xmlEncoding and encoding.
This is a slight variation of the above theme but I'm putting here in case others hit this and cannot make sense of it ...as I did.
When using saveXML(), preserveWhiteSpace in the target DOMdocument does not apply to imported nodes (as at PHP 5.6).
Consider the following code:
$dom = new DOMDocument(); //create a document
$dom->preserveWhiteSpace = false; //disable whitespace preservation
$dom->formatOutput = true; //pretty print output
$documentElement = $dom->createElement("Entry"); //create a node
$dom->appendChild ($documentElement); //append it
$message = new DOMDocument(); //create another document
$message->loadXML($messageXMLtext); //populate the new document from XML text
$node=$dom->importNode($message->documentElement,true); //import the new document content to a new node in the original document
$documentElement->appendChild($node); //append the new node to the document Element
$dom->saveXML($dom->documentElement); //print the original document
In this context, the $dom->saveXML(); statement will NOT pretty print the content imported from $message, but content originally in $dom will be pretty printed.
In order to achieve pretty printing for the entire $dom document, the line:
$message->preserveWhiteSpace = false;
must be included after the $message = new DOMDocument(); line - ie. the document/s from which the nodes are imported must also have preserveWhiteSpace = false.
based on the answer by #heavenevil
This function pretty prints using the browser
function prettyPrintXmlToBrowser(SimpleXMLElement $xml)
{
$domXml = new DOMDocument('1.0');
$domXml->preserveWhiteSpace = false;
$domXml->formatOutput = true;
$domXml->loadXML($xml->asXML());
$xmlString = $domXml->saveXML();
echo nl2br(str_replace(' ', ' ', htmlspecialchars($xmlString)));
}

How to decode HTML entities when saving an XML file?

I have the following code in my PHP script:
$str = '<item></item>';
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->formatOutput = true;
$xml->load('file.xml');
$items = $addon->getElementsByTagName('items')->item(0);
$items->nodeValue = $str;
$xml->save('file.xml');
In the saved file.xml I see the following:
<item><\item>
How can I save it in the XML file without encoding HTML entities?
Use a DOMDocumentFragment:
<?php
$doc = new DOMDocument();
$doc->load('file.xml'); // '<doc><items/></doc>'
$item = $doc->createDocumentFragment();
$item->appendXML('<item id="1">item</item>');
$doc->getElementsByTagName('items')->item(0)->appendChild($item);
$doc->save('file.xml');
If you're appending to the root element, use $doc->documentElement->appendChild($item); instead of getElementsByTagName.

Self-closing tags using createElement

I need to add a self-closing tag to XML file with DOM in PHP, but I don't know how, because standardly, this tag looks like this:
<tag></tag>
But it should look like this:
<tag/>
DOM will do that automatically for you
$dom = new DOMDocument;
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml();
will give by default
<?xml version="1.0"?>
<foo/>
unless you do
$dom = new DOMDocument;
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml($dom, LIBXML_NOEMPTYTAG);
which would then give
<?xml version="1.0" encoding="UTF-8"?>
<foo></foo>
Just pass a node param to DOMDocument::saveXML in order to output only a specific node, without any XML declaration:
$doc = new \DOMDocument('1.0', 'UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = false;
$node = $doc->createElement('foo');
// Trimming the default carriage return char from output
echo trim($doc->saveXML($node));
will give
<foo/>
not containing any new line / carriage return ending char.

Categories