I'm generating an XML file via PHP and I'm doing it this way:
$dom = new DOMDocument();
$root = $dom->createElement('Root');
...
// some node definitions here etc
$root->appendChild($product);
$root->appendChild($quantity);
$root->appendChild($measureUnit);
$root->appendChild($lineNumber);
...
$dom->appendChild($root);
$dom->save( '/some/dir/some-name.xml');
It all works well until I encountered some problem, when I get to the part that I needed to append lets say N child nodes. This meant that I would be calling the function appendChild() 'N' times too - and that resulted on a very long php script which is a little hard to maintain.
I know we can split the main script on smaller files for better maintenance but are there better ways to just 'chain' the 'appendChild' calls so it would save as a lot of written lines or is there a somewhat magic function such as 'appendChildren' available?
This is my first time using the DOMDocument() class, I hope someone can shed me some light.
Thank you
You can nest the DOMDocument::createElement() into DOMNode::appendChild() calls and chain child nodes or text content assignments.
Since PHP 8.0 DOMNode::append() can be used to append multiple nodes and strings.
$document = new DOMDocument();
// nest createElement inside appendChild
$document->appendChild(
// store node in variable
$root = $document->createElement('root')
);
// chain textContent assignment to appendChild
$root
->appendChild($document->createElement('product'))
->textContent = 'Example';
// use append to add multiple nodes
$root->append(
$product = $document->createElement('measureUnit'),
$quantity = $document->createElement('quantity'),
);
$product->textContent = 'cm';
$quantity->textContent = '42';
$document->formatOutput= true;
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<root>
<product>Example</product>
<measureUnit>cm</measureUnit>
<quantity>42</quantity>
</root>
I am using an interface for reusable and maintainable parts, usually:
interface XMLAppendable {
public function appendTo(DOMElement $parent): void;
}
class YourXMLPart implements XMLAppendable {
private $_product;
private $_unit;
private $_quantity;
public function __construct(string $product, string $unit, int $quantity) {
$this->_product = $product;
$this->_unit = $unit;
$this->_quantity = $quantity;
}
public function appendTo(DOMElement $parent): void {
$document = $parent->ownerDocument;
$parent
->appendChild($document->createElement('product'))
->textContent = $this->_product;
$parent
->appendChild($document->createElement('measureUnit'))
->textContent = $this->_unit;
$parent
->appendChild($document->createElement('quantity'))
->textContent = $this->_quantity;
}
}
$document = new DOMDocument();
// nest createElement inside appendChild
$document->appendChild(
// store node in variable
$root = $document->createElement('root')
);
$part = new YourXMLPart('Example', 'cm', 42);
$part->appendTo($root);
$document->formatOutput= true;
echo $document->saveXML();
Related
I'm writing a script to clean up the so-called HTML document that MS Word creates when you Save As "Web Page, Filtered". I want the resulting document to be valid XHTML1.
The first thing I want to do is to change the !DOCTYPE so it will be XHTML 1.0 Strict instead of ...4.0 Transitional.
I wrote code that looked as if it should work, but when I run it I get a Segmentation fault from PHP. At first, I thought this was occurring in the save function, but after adding some echo statements for debugging I now think that the problem is at the places marked {{{1}}} and {{{2}}} in the code (below).
Here's what I think is going on: at {{{1}}} I am iterating through the DOMNodeList, treating it as if it were an ordinary array that I can traverse with foreach.
But at {{{2}}} I change the parent's subnode list. I suspect this breaks my foreach: either the DOMNodeList or my foreach pointer becomes invalid.
So what is the "right" way to make changes to a a DOM tree while you're traversing it? I came up with two possible options:
Copy the DOMNodeList into an ordinary array:
$nodelist = [];
foreach ($node->childNodes as $subnode) {
$nodelist[] = $subnode;
// Or perhaps an object that contains the appropriate code and parameters for the change I want to make
}
foreach ($nodelist as $subnode) {
// make the appropriate change
}
Traverse the DOM tree, but do not make any changes. Instead, create an array of all the places where I want to make changes. When finished, go through that array and make the changes.
Maybe there's some "official" way of doing this????
The relevant parts of my code below:
<?
$dom = new DOMDocument();
$dom->loadHTMLFile($htmFName);
$trav = new DOMTraverser($dom);
$storyParms = new StoryParams("some string");
$callback = new StoryDocCallback($htmFName);
$trav->traverse($callback, $storyParms);
$dom->save("y");
class DOMTraverser
{
private $docNode;
private $callback;
private $param;
public function __construct(DOMNode $node)
{
$this->docNode = $node;
}
public function traverse(GeneralCallBack $cb, $param)
{
$this->callback = $cb;
$this->param = $param;
$this->traverseNode($this->docNode);
}
public function traverseNode($node)
{
$this->callback->callBefore($node, $this->param);
if ($node->hasChildNodes()) {
{{1}} foreach ($node->childNodes as $subnode) {
if($subnode != null) {
$this->traverseNode($subnode);
}
}
}
}
}
class StoryDocCallback implements GeneralCallback
{
public function callbefore($node, $param)
{
$name = $node->nodeName;
if (is_a($node, "DOMDocumentType")) {
$this->repairDocType($node);
return;
}
...
}
protected function repairDocType(DOMNode $node)
{
$impl = new DomImplementation();
$rootName = "html";
$pubID = "-//W3C//DTD XHTML 1.0 Strict//EN";
$sysID = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
$newDocType = $impl->createDocumentType($rootName, $pubID, $sysID);
$parent = $node->parentNode;
{{2}} $rc = $parent->replaceChild($newDocType, $node);
assert($rc != false);
}
...
}
I want to know if passing an XML node and then calling upon a method to access it is legal syntax in PHP. I tried converting to string, but that didn't work.
What am I doing wrong?
What would be the best/simplest alternative?
XML
<user>
<widgets>
<widget>Widget 1</widget>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<widget>Widget 2</widget>
</widgets>
</user>
PHP
<?php
$xmlfile = 'widgets/widgets_files/widgets.xml';
$widgets = array();
$user = new SimpleXMLElement($xmlfile, NULL, true);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom = dom_import_simplexml($user)->ownerDocument;
foreach ($user->widgets->widget as $widget) {
$new_widget = new Widget($widget); //Where the node gets passed
array_push($widgets, $new_widget);
}
//For example
$new_widget[0]->set_subnodes();
$new_widget[0]->get_subnodes();
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
foreach ($this->widget->stuff->morestuff as $morestuff => $value) {
$this->stuffArray[$morestuff] = $value;
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
echo$stuff;
}
}
}
It is indeed possible to pass XML objects as parameters to objects and to call methods on them, but there are a number of errors in your code which are stopping it from working. In particular, the XML that you are using isn't the structure that you think it is--the stuff and morestuff nodes are not children of widget, so none of the actions that you're trying to perform with them will work. Here's a corrected version of the XML and some PHP code that does what I think you're trying to do above:
$widgets = array();
# you can load your code from a file, obviously--for the purposes of the example,
# I'm loading mine using a function.
$sxe = simplexml_load_string( get_my_xml() );
foreach ($sxe->widgets->widget as $widget) {
$new_widget = new Widget($widget); // Where the node gets passed
array_push($widgets, $new_widget);
}
// For example
foreach ($widgets as $w) {
$w->set_subnodes();
$w->get_subnodes();
}
function get_my_xml() {
return <<<XML
<user>
<widgets>
<widget>Widget 1
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Other Things</morestuff>
</stuff>
</widget>
<widget>Widget 2
<stuff>
<morestuff>Widget Two's Things</morestuff>
</stuff>
<stuff>
<morestuff>Widget Two's Other Things</morestuff>
</stuff>
</widget>
</widgets>
</user>
XML;
}
The Widget object:
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
# put all the "morestuff" nodes into the stuffArray
foreach ($this->widget->xpath("stuff/morestuff") as $ms) {
print "pushing $ms on to array" . PHP_EOL;
array_push($this->stuffArray, $ms);
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
print "Running get_subnodes: got $stuff" . PHP_EOL;
}
}
}
Output:
pushing Things on to array
pushing Other Things on to array
Running get_subnodes: got Things
Running get_subnodes: got Other Things
pushing Widget Two's Things on to array
pushing Widget Two's Other Things on to array
Running get_subnodes: got Widget Two's Things
Running get_subnodes: got Widget Two's Other Things
Simple XML templates like these ones :
structure.xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<document>
<book>first book</book>
<book>second book</book>
((other_books))
</document>
book_element.xml :
<book>((name))</book>
And this test :
<?php
Header("Content-type: text/xml; charset=UTF-8");
class XMLTemplate extends DOMDocument
{
private $_content_storage;
private $_filepath;
private $_tags;
public function XMLTemplate( $sFilePath )
{
if( !file_exists( $sFilePath ) ) throw new Exception("file not found");
$this->_filepath = $sFilePath;
$this->_tags = [];
$this->_content_storage = file_get_contents( $this->_filepath );
}
public function Get()
{
$this->merge();
$this->loadXML( $this->_content_storage );
return $this->saveXML();
}
public function SetTag( $sTagName, $sReplacement )
{
$this->_tags[ $sTagName ] = $sReplacement;
}
private function merge()
{
foreach( $this->_tags as $k=>$v)
{
$this->_content_storage = preg_replace(
"/\({2}". $k ."\){2}/i",
$v,
$this->_content_storage
);
}
$this->_content_storage = preg_replace(
"/\({2}[a-z0-9_\-]+\){2}/i",
"",
$this->_content_storage
);
}
}
$aBooks = [
"troisième livre",
"quatrième livre"
];
$Books = "";
foreach( $aBooks as $bookName )
{
$XMLBook = new XMLTemplate("book_element.xml");
$XMLBook->SetTag( "name", $bookName );
$Books .= $XMLBook->Get();
}
$XMLTemplate = new XMLTemplate("test.xml");
$XMLTemplate->SetTag("other_books", $Books);
echo $XMLTemplate->Get();
?>
Give me error :
Warning: DOMDocument::loadXML(): XML declaration allowed only at the start of the document in Entity, line: 5
Because loadXML() method add automatically the declaration to the content, but i need to inject parts of xml in the final template like above. How to disable this annoying auto adding and let me use my declaration ? Or another idea to conturn the problem ?
If you dislike the error and you want to save the document you'd like to merge without the XML declaration, just save the document element instead of the whole document.
See both variants in the following example-code (online-demo):
$doc = new DOMDocument();
$doc->loadXML('<root><child/></root>');
echo "The whole doc:\n\n";
echo $doc->saveXML();
echo "\n\nThe root element only:\n\n";
echo $doc->saveXML($doc->documentElement);
The output is as followed:
The whole doc:
<?xml version="1.0"?>
<root><child/></root>
The root element only:
<root><child/></root>
This probably should be already helpful for you. Additionally there is a constant for libxml which is said can be used to control whether or not the XML declaration is output. But I never used it:
LIBXML_NOXMLDECL (integer)
Drop the XML declaration when saving a document
Note: Only available in Libxml >= 2.6.21
From: http://php.net/libxml.constants
See the link for additional options, you might want to use the one or the other in the future.
I'm manipulating a short HTML snippet with XPath; when I output the changed snippet back with $doc->saveHTML(), DOCTYPE gets added, and HTML / BODY tags wrap the output. I want to remove those, but keep all the children inside by only using the DOMDocument functions. For example:
$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<img src="http://" alt="">
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
echo htmlentities( $doc->saveHTML() );
This produces:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ...>
<html><body>
<p><strong>Title...</strong></p>
<img src="http://" alt="">
<p>...to be one of those crowning achievements...</p>
</body></html>
I've attempted some of the simple tricks, such as:
# removes doctype
$doc->removeChild($doc->firstChild);
# <body> replaces <html>
$doc->replaceChild($doc->firstChild->firstChild, $doc->firstChild);
So far that only removes DOCTYPE and replaces HTML with BODY. However, what remains is body > variable number of elements at this point.
How do I remove the <body> tag but keep all of its children, given that they will be structured variably, in a neat - clean way with PHP's DOM manipulation?
UPDATE
Here's a version that doesn't extend DOMDocument, though I think extending is the proper approach, since you're trying to achieve functionality that isn't built-in to the DOM API.
Note: I'm interpreting "clean" and "without workarounds" as keeping all manipulation to the DOM API. As soon as you hit string manipulation, that's workaround territory.
What I'm doing, just as in the original answer, is leveraging DOMDocumentFragment to manipulate multiple nodes all sitting at the root level. There is no string manipulation going on, which to me qualifies as not being a workaround.
$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p><img src="http://" alt=""><p>...to be one of those crowning achievements...</p>');
// Remove doctype node
$doc->doctype->parentNode->removeChild($doc->doctype);
// Remove html element, preserving child nodes
$html = $doc->getElementsByTagName("html")->item(0);
$fragment = $doc->createDocumentFragment();
while ($html->childNodes->length > 0) {
$fragment->appendChild($html->childNodes->item(0));
}
$html->parentNode->replaceChild($fragment, $html);
// Remove body element, preserving child nodes
$body = $doc->getElementsByTagName("body")->item(0);
$fragment = $doc->createDocumentFragment();
while ($body->childNodes->length > 0) {
$fragment->appendChild($body->childNodes->item(0));
}
$body->parentNode->replaceChild($fragment, $body);
// Output results
echo htmlentities($doc->saveHTML());
ORIGINAL ANSWER
This solution is rather lengthy, but it's because it goes about it by extending the DOM in order to keep your end code as short as possible.
sliceOutNode is where the magic happens. Let me know if you have any questions:
<?php
class DOMDocumentExtended extends DOMDocument
{
public function __construct( $version = "1.0", $encoding = "UTF-8" )
{
parent::__construct( $version, $encoding );
$this->registerNodeClass( "DOMElement", "DOMElementExtended" );
}
// This method will need to be removed once PHP supports LIBXML_NOXMLDECL
public function saveXML( DOMNode $node = NULL, $options = 0 )
{
$xml = parent::saveXML( $node, $options );
if( $options & LIBXML_NOXMLDECL )
{
$xml = $this->stripXMLDeclaration( $xml );
}
return $xml;
}
public function stripXMLDeclaration( $xml )
{
return preg_replace( "|<\?xml(.+?)\?>[\n\r]?|i", "", $xml );
}
}
class DOMElementExtended extends DOMElement
{
public function sliceOutNode()
{
$nodeList = new DOMNodeListExtended( $this->childNodes );
$this->replaceNodeWithNode( $nodeList->toFragment( $this->ownerDocument ) );
}
public function replaceNodeWithNode( DOMNode $node )
{
return $this->parentNode->replaceChild( $node, $this );
}
}
class DOMNodeListExtended extends ArrayObject
{
public function __construct( $mixedNodeList )
{
parent::__construct( array() );
$this->setNodeList( $mixedNodeList );
}
private function setNodeList( $mixedNodeList )
{
if( $mixedNodeList instanceof DOMNodeList )
{
$this->exchangeArray( array() );
foreach( $mixedNodeList as $node )
{
$this->append( $node );
}
}
elseif( is_array( $mixedNodeList ) )
{
$this->exchangeArray( $mixedNodeList );
}
else
{
throw new DOMException( "DOMNodeListExtended only supports a DOMNodeList or array as its constructor parameter." );
}
}
public function toFragment( DOMDocument $contextDocument )
{
$fragment = $contextDocument->createDocumentFragment();
foreach( $this as $node )
{
$fragment->appendChild( $contextDocument->importNode( $node, true ) );
}
return $fragment;
}
// Built-in methods of the original DOMNodeList
public function item( $index )
{
return $this->offsetGet( $index );
}
public function __get( $name )
{
switch( $name )
{
case "length":
return $this->count();
break;
}
return false;
}
}
// Load HTML/XML using our fancy DOMDocumentExtended class
$doc = new DOMDocumentExtended();
$doc->loadHTML('<p><strong>Title...</strong></p><img src="http://" alt=""><p>...to be one of those crowning achievements...</p>');
// Remove doctype node
$doc->doctype->parentNode->removeChild( $doc->doctype );
// Slice out html node
$html = $doc->getElementsByTagName("html")->item(0);
$html->sliceOutNode();
// Slice out body node
$body = $doc->getElementsByTagName("body")->item(0);
$body->sliceOutNode();
// Pick your poison: XML or HTML output
echo htmlentities( $doc->saveXML( NULL, LIBXML_NOXMLDECL ) );
echo htmlentities( $doc->saveHTML() );
saveHTML can output a subset of document, meaning we can ask it to output every child node one by one, by traversing body.
$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<img src="http://google.com/img.jpeg" alt="">
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
// Let's traverse the body and output every child node
$bodyNode = $doc->getElementsByTagName('body')->item(0);
foreach ($bodyNode->childNodes as $childNode) {
echo $doc->saveHTML($childNode);
}
This might not be a most elegant solution, but it works. Alternatively, we can wrap all children nodes inside some container element (say a div) and output only that container (but container tag will be included in the output).
Here how I've done it:
-- Quick helper function that gives you HTML contents for specific DOM element
function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
-- Find body node in your doc and get its contents
$query = $xpath->query("//body")->item(0);
if($query)
{
echo nodeContent($query);
}
UPDATE 1:
Some extra info: Since PHP/5.3.6, DOMDocument->saveHTML() accepts an optional DOMNode parameter similarly to DOMDocument->saveXML(). You can do
$xpath = new DOMXPath($doc);
$query = $xpath->query("//body")->item(0);
echo $doc->saveHTML($query);
for others, the helper function will help
tl;dr
requires: PHP 5.4.0 and Libxml 2.6.0
$doc->loadHTML("<p>test</p>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
explanation
http://php.net/manual/en/domdocument.loadhtml.php
"Since PHP 5.4.0 and Libxml 2.6.0, you may also use the options parameter to specify additional Libxml parameters."
LIBXML_HTML_NOIMPLIED Sets HTML_PARSE_NOIMPLIED flag, which turns off the automatic adding of implied html/body... elements.
LIBXML_HTML_NODEFDTD Sets HTML_PARSE_NODEFDTD flag, which prevents a default doctype being added when one is not found.
You have 2 ways to accomplish this:
$content = substr($content, strpos($content, '<html><body>') + 12); // Remove Everything Before & Including The Opening HTML & Body Tags.
$content = substr($content, 0, -14); // Remove Everything After & Including The Closing HTML & Body Tags.
Or even better is this way:
$dom->normalizeDocument();
$content = $dom->saveHTML();
When importing an extended DOMElement object with specific properties into another DOMDocument than the one it was created with all properties are lost (I guess it doesn't actually copy the no but a new node is created for the other document and just the values for the DOMElement class are copied to the new node). What would be the best way to have the properties still available in the imported element?
Here's an example of the problem:
<?php
class DOMExtendedElement extends DOMElement {
private $itsVerySpecialProperty;
public function setVerySpecialProperty($property) {$this->itsVerySpecialProperty = $property;}
}
// First document
$firstDocument = new DOMDocument();
$firstDocument->registerNodeClass("DOMElement", "DOMExtendedElement");
$elm = $firstDocument->createElement("elm");
$elm->setVerySpecialProperty("Hello World!");
var_dump($elm);
// Second document
$secondDocument = new DOMDocument();
var_dump($secondDocument->importNode($elm, true)); // The imported element is a DOMElement and doesn't have any other properties at all
// Third document
$thirdDocument = new DOMDocument();
$thirdDocument->registerNodeClass("DOMElement", "DOMExtendedElement");
var_dump($thirdDocument->importNode($elm, true)); // The imported element is a DOMExtendedElement and does have the extra property but it's empty
?>
It may have a better solution but you may need to clone the first object
class DOMExtendedElement extends DOMElement {
private $itsVerySpecialProperty;
public function setVerySpecialProperty($property) {$this->itsVerySpecialProperty = $property;}
public function getVerySpecialProperty(){ return isset($this->itsVerySpecialProperty) ?: ''; }
}
// First document
$firstDocument = new DOMDocument();
$firstDocument->registerNodeClass("DOMElement", "DOMExtendedElement");
$elm = $firstDocument->createElement("elm");
$elm->setVerySpecialProperty("Hello World!");
var_dump($elm);
$elm2 = clone $elm;
// Third document
$thirdDocument = new DOMDocument();
$thirdDocument->registerNodeClass("DOMElement", "DOMExtendedElement");
$thirdDocument->importNode($elm2);
var_dump($elm2);
Result :
object(DOMExtendedElement)#2 (1) {
["itsVerySpecialProperty:private"]=>
string(12) "Hello World!"
}
object(DOMExtendedElement)#3 (1) {
["itsVerySpecialProperty:private"]=>
string(12) "Hello World!"
}
Demo here