DOMDocument::loadXML() for parts of XML - php

Simple XML templates like these ones :
structure.xml :
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<document>
<book>first book</book>
<book>second book</book>
((other_books))
</document>
book_element.xml :
<book>((name))</book>
And this test :
<?php
Header("Content-type: text/xml; charset=UTF-8");
class XMLTemplate extends DOMDocument
{
private $_content_storage;
private $_filepath;
private $_tags;
public function XMLTemplate( $sFilePath )
{
if( !file_exists( $sFilePath ) ) throw new Exception("file not found");
$this->_filepath = $sFilePath;
$this->_tags = [];
$this->_content_storage = file_get_contents( $this->_filepath );
}
public function Get()
{
$this->merge();
$this->loadXML( $this->_content_storage );
return $this->saveXML();
}
public function SetTag( $sTagName, $sReplacement )
{
$this->_tags[ $sTagName ] = $sReplacement;
}
private function merge()
{
foreach( $this->_tags as $k=>$v)
{
$this->_content_storage = preg_replace(
"/\({2}". $k ."\){2}/i",
$v,
$this->_content_storage
);
}
$this->_content_storage = preg_replace(
"/\({2}[a-z0-9_\-]+\){2}/i",
"",
$this->_content_storage
);
}
}
$aBooks = [
"troisième livre",
"quatrième livre"
];
$Books = "";
foreach( $aBooks as $bookName )
{
$XMLBook = new XMLTemplate("book_element.xml");
$XMLBook->SetTag( "name", $bookName );
$Books .= $XMLBook->Get();
}
$XMLTemplate = new XMLTemplate("test.xml");
$XMLTemplate->SetTag("other_books", $Books);
echo $XMLTemplate->Get();
?>
Give me error :
Warning: DOMDocument::loadXML(): XML declaration allowed only at the start of the document in Entity, line: 5
Because loadXML() method add automatically the declaration to the content, but i need to inject parts of xml in the final template like above. How to disable this annoying auto adding and let me use my declaration ? Or another idea to conturn the problem ?

If you dislike the error and you want to save the document you'd like to merge without the XML declaration, just save the document element instead of the whole document.
See both variants in the following example-code (online-demo):
$doc = new DOMDocument();
$doc->loadXML('<root><child/></root>');
echo "The whole doc:\n\n";
echo $doc->saveXML();
echo "\n\nThe root element only:\n\n";
echo $doc->saveXML($doc->documentElement);
The output is as followed:
The whole doc:
<?xml version="1.0"?>
<root><child/></root>
The root element only:
<root><child/></root>
This probably should be already helpful for you. Additionally there is a constant for libxml which is said can be used to control whether or not the XML declaration is output. But I never used it:
LIBXML_NOXMLDECL (integer)
Drop the XML declaration when saving a document
Note: Only available in Libxml >= 2.6.21
From: http://php.net/libxml.constants
See the link for additional options, you might want to use the one or the other in the future.

Related

PHP XML generation, chain `appendChild()`

I'm generating an XML file via PHP and I'm doing it this way:
$dom = new DOMDocument();
$root = $dom->createElement('Root');
...
// some node definitions here etc
$root->appendChild($product);
$root->appendChild($quantity);
$root->appendChild($measureUnit);
$root->appendChild($lineNumber);
...
$dom->appendChild($root);
$dom->save( '/some/dir/some-name.xml');
It all works well until I encountered some problem, when I get to the part that I needed to append lets say N child nodes. This meant that I would be calling the function appendChild() 'N' times too - and that resulted on a very long php script which is a little hard to maintain.
I know we can split the main script on smaller files for better maintenance but are there better ways to just 'chain' the 'appendChild' calls so it would save as a lot of written lines or is there a somewhat magic function such as 'appendChildren' available?
This is my first time using the DOMDocument() class, I hope someone can shed me some light.
Thank you
You can nest the DOMDocument::createElement() into DOMNode::appendChild() calls and chain child nodes or text content assignments.
Since PHP 8.0 DOMNode::append() can be used to append multiple nodes and strings.
$document = new DOMDocument();
// nest createElement inside appendChild
$document->appendChild(
// store node in variable
$root = $document->createElement('root')
);
// chain textContent assignment to appendChild
$root
->appendChild($document->createElement('product'))
->textContent = 'Example';
// use append to add multiple nodes
$root->append(
$product = $document->createElement('measureUnit'),
$quantity = $document->createElement('quantity'),
);
$product->textContent = 'cm';
$quantity->textContent = '42';
$document->formatOutput= true;
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<root>
<product>Example</product>
<measureUnit>cm</measureUnit>
<quantity>42</quantity>
</root>
I am using an interface for reusable and maintainable parts, usually:
interface XMLAppendable {
public function appendTo(DOMElement $parent): void;
}
class YourXMLPart implements XMLAppendable {
private $_product;
private $_unit;
private $_quantity;
public function __construct(string $product, string $unit, int $quantity) {
$this->_product = $product;
$this->_unit = $unit;
$this->_quantity = $quantity;
}
public function appendTo(DOMElement $parent): void {
$document = $parent->ownerDocument;
$parent
->appendChild($document->createElement('product'))
->textContent = $this->_product;
$parent
->appendChild($document->createElement('measureUnit'))
->textContent = $this->_unit;
$parent
->appendChild($document->createElement('quantity'))
->textContent = $this->_quantity;
}
}
$document = new DOMDocument();
// nest createElement inside appendChild
$document->appendChild(
// store node in variable
$root = $document->createElement('root')
);
$part = new YourXMLPart('Example', 'cm', 42);
$part->appendTo($root);
$document->formatOutput= true;
echo $document->saveXML();

PHP DOM: How do I replace the DocumentType node in my XML document?

I'm writing a script to clean up the so-called HTML document that MS Word creates when you Save As "Web Page, Filtered". I want the resulting document to be valid XHTML1.
The first thing I want to do is to change the !DOCTYPE so it will be XHTML 1.0 Strict instead of ...4.0 Transitional.
I wrote code that looked as if it should work, but when I run it I get a Segmentation fault from PHP. At first, I thought this was occurring in the save function, but after adding some echo statements for debugging I now think that the problem is at the places marked {{{1}}} and {{{2}}} in the code (below).
Here's what I think is going on: at {{{1}}} I am iterating through the DOMNodeList, treating it as if it were an ordinary array that I can traverse with foreach.
But at {{{2}}} I change the parent's subnode list. I suspect this breaks my foreach: either the DOMNodeList or my foreach pointer becomes invalid.
So what is the "right" way to make changes to a a DOM tree while you're traversing it? I came up with two possible options:
Copy the DOMNodeList into an ordinary array:
$nodelist = [];
foreach ($node->childNodes as $subnode) {
$nodelist[] = $subnode;
// Or perhaps an object that contains the appropriate code and parameters for the change I want to make
}
foreach ($nodelist as $subnode) {
// make the appropriate change
}
Traverse the DOM tree, but do not make any changes. Instead, create an array of all the places where I want to make changes. When finished, go through that array and make the changes.
Maybe there's some "official" way of doing this????
The relevant parts of my code below:
<?
$dom = new DOMDocument();
$dom->loadHTMLFile($htmFName);
$trav = new DOMTraverser($dom);
$storyParms = new StoryParams("some string");
$callback = new StoryDocCallback($htmFName);
$trav->traverse($callback, $storyParms);
$dom->save("y");
class DOMTraverser
{
private $docNode;
private $callback;
private $param;
public function __construct(DOMNode $node)
{
$this->docNode = $node;
}
public function traverse(GeneralCallBack $cb, $param)
{
$this->callback = $cb;
$this->param = $param;
$this->traverseNode($this->docNode);
}
public function traverseNode($node)
{
$this->callback->callBefore($node, $this->param);
if ($node->hasChildNodes()) {
{{1}} foreach ($node->childNodes as $subnode) {
if($subnode != null) {
$this->traverseNode($subnode);
}
}
}
}
}
class StoryDocCallback implements GeneralCallback
{
public function callbefore($node, $param)
{
$name = $node->nodeName;
if (is_a($node, "DOMDocumentType")) {
$this->repairDocType($node);
return;
}
...
}
protected function repairDocType(DOMNode $node)
{
$impl = new DomImplementation();
$rootName = "html";
$pubID = "-//W3C//DTD XHTML 1.0 Strict//EN";
$sysID = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
$newDocType = $impl->createDocumentType($rootName, $pubID, $sysID);
$parent = $node->parentNode;
{{2}} $rc = $parent->replaceChild($newDocType, $node);
assert($rc != false);
}
...
}

Pass XML node as parameter on object instantiation and then calling subnodes from it

I want to know if passing an XML node and then calling upon a method to access it is legal syntax in PHP. I tried converting to string, but that didn't work.
What am I doing wrong?
What would be the best/simplest alternative?
XML
<user>
<widgets>
<widget>Widget 1</widget>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<widget>Widget 2</widget>
</widgets>
</user>
PHP
<?php
$xmlfile = 'widgets/widgets_files/widgets.xml';
$widgets = array();
$user = new SimpleXMLElement($xmlfile, NULL, true);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom = dom_import_simplexml($user)->ownerDocument;
foreach ($user->widgets->widget as $widget) {
$new_widget = new Widget($widget); //Where the node gets passed
array_push($widgets, $new_widget);
}
//For example
$new_widget[0]->set_subnodes();
$new_widget[0]->get_subnodes();
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
foreach ($this->widget->stuff->morestuff as $morestuff => $value) {
$this->stuffArray[$morestuff] = $value;
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
echo$stuff;
}
}
}
It is indeed possible to pass XML objects as parameters to objects and to call methods on them, but there are a number of errors in your code which are stopping it from working. In particular, the XML that you are using isn't the structure that you think it is--the stuff and morestuff nodes are not children of widget, so none of the actions that you're trying to perform with them will work. Here's a corrected version of the XML and some PHP code that does what I think you're trying to do above:
$widgets = array();
# you can load your code from a file, obviously--for the purposes of the example,
# I'm loading mine using a function.
$sxe = simplexml_load_string( get_my_xml() );
foreach ($sxe->widgets->widget as $widget) {
$new_widget = new Widget($widget); // Where the node gets passed
array_push($widgets, $new_widget);
}
// For example
foreach ($widgets as $w) {
$w->set_subnodes();
$w->get_subnodes();
}
function get_my_xml() {
return <<<XML
<user>
<widgets>
<widget>Widget 1
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Other Things</morestuff>
</stuff>
</widget>
<widget>Widget 2
<stuff>
<morestuff>Widget Two's Things</morestuff>
</stuff>
<stuff>
<morestuff>Widget Two's Other Things</morestuff>
</stuff>
</widget>
</widgets>
</user>
XML;
}
The Widget object:
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
# put all the "morestuff" nodes into the stuffArray
foreach ($this->widget->xpath("stuff/morestuff") as $ms) {
print "pushing $ms on to array" . PHP_EOL;
array_push($this->stuffArray, $ms);
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
print "Running get_subnodes: got $stuff" . PHP_EOL;
}
}
}
Output:
pushing Things on to array
pushing Other Things on to array
Running get_subnodes: got Things
Running get_subnodes: got Other Things
pushing Widget Two's Things on to array
pushing Widget Two's Other Things on to array
Running get_subnodes: got Widget Two's Things
Running get_subnodes: got Widget Two's Other Things

How do I dynamically create a PHP SimpleXMLElement Object while keeping current properties?

I am reading in a an xml file which returns me a SimpleXMLElement Object representation of the xml. I am going to take an array and feed in new values to that object. I don't know what I am going to be in that array.
if I were to brute force this I would do something like this.
//Solution 1: Brute Force
//Just creating an array and value for purposes of demonstration.
$arOfData = array( [0]=>"theFirstNode", [1]=>"theSecondNode",[2]=>"theThirdNode" );
$value = "The XML Node Value";
$simpleXml->$arOfData[0]->$arOfData[1]->$arOfData[2] = $value;
//The next best thing I can think of doing is something like this.
//Solution 2: Semi-brute force
//
foreach($this->arrayData as $key => $value) {
$xmlNodes = explode( '-', $key);
$numNodes = count($xmlNodes);
switch($numNodes) {
case 1:
$simpleXml->$xmlNodes[0] = $value;
break;
case 2:
$simpleXml->$xmlNodes[0]->$xmlNodes[1] = $value;
break;
case 3:
$simpleXml->$xmlNodes[0]->$xmlNodes[1]->$xmlNodes[2] = $value;
break;
case 4:
$simpleXml->$xmlNodes[0]->$xmlNodes[1]->$xmlNodes[2]->$xmlNodes[3] = $value;
break;
case 5:
$simpleXml->$xmlNodes[0]->$xmlNodes[1]->$xmlNodes[2]->$xmlNodes[3]->$xmlNodes[4] = $value;
break;
}
}
*note This solution uses the array key and explodes it to an array delimited by a dash and then uses the array value as the new xml value. So don't let that distract you.
The problem with solution #2 is: what happens when we get a xml node that is deeper than 5? Its not going to be stuffed into our new object we are creating. Oh oh. It's also not very elegant ;). I am not sure how to do this in a more recursive manner.
Like you already wrote in your question, you need to have this dynamically because you do not know about the number of parent elements.
You need to dig a little deeper into how simpexml works to get this done.
But first let me suggest you to have a different notation, not with the minus sign you have but with a slash like in a path.
first/second/third
This is also common with Xpath and I think it's pretty well speaking for itself. Also the minus sign can be part of an element name, but the slash can not. So this is just a bit better.
Before I show you how you can easily access that <third> element node to set its value, first lets look at some assignment basics in simplexml.
To access and set this element-node in a SimpleXMLElement see the following example:
$xml = new SimpleXMLElement('<root><first><second><third/></second></first></root>');
$element = $xml->first->second->third;
$element[0] = "value";
This is pretty straight forward but you can see two things here:
The <third> element already exists in the document.
The code uses as simplexml-self-reference ([0]) which allows to set the XML value of the element variable (and not the variable). This is specific to how SimpleXMLElement works.
The second point also contains the solution to the problem how to deal with non-existent elements. $element[0] is NULL in case the element does not exists:
$xml = new SimpleXMLElement('<root><first><second/></first></root>');
$element = $xml->first->second->third;
var_dump($element[0]); # NULL
So let's try to conditionally add the third element in case it does not exists:
if ($xml->first->second->third[0] === NULL) {
$xml->first->second->third = "";
}
This does solve that problem. So the only thing left to do is to do that in an iterative fashion for all parts of the path:
first/second/third
To keep this easy, create a function for this:
/**
* Modify an elements value specified by a string-path.
*
* #param SimpleXMLElement $parent
* #param string $path
* #param string $value (optional)
*
* #return SimpleXMLElement the modified element-node
*/
function simplexml_deep_set(SimpleXMLElement $parent, $path, $value = '')
{
### <mocked> to be removed later: ###
if ($parent->first->second->third[0] === NULL) {
$parent->first->second->third = "";
}
$element = $parent->first->second->third;
### </mocked> ###
$element[0] = $value;
return $element;
}
Because the function is mocked, it can be used directly:
$xml = new SimpleXMLElement('<root><first><second/></first></root>');
simplexml_deep_set($xml, "first/second/third", "The XML Node Value");
$xml->asXML('php://output');
And this works:
<?xml version="1.0"?>
<root><first><second><third>The XML Node Value</third></second></first></root>
So now removing the mock. First insert the explode like you have it as well. Then all that needs to be done is to go along each step of the path and create the element conditionally if it yet does not exist. In the end $element will be the element to modify:
$steps = explode('/', $path);
$element = $parent;
foreach ($steps as $step)
{
if ($element->{$step}[0] === NULL) {
$element->$step = '';
}
$element = $element->$step;
}
This foreach is needed to replace the mock with a working version. Compare with the full function definition at a glance:
function simplexml_deep_set(SimpleXMLElement $parent, $path, $value = '')
{
$steps = explode('/', $path);
$element = $parent;
foreach ($steps as $step)
{
if ($element->{$step}[0] === NULL) {
$element->$step = "";
}
$element = $element->$step;
}
$element[0] = $value;
return $element;
}
Lets modify more crazy things to test it out:
$xml = new SimpleXMLElement('<root><first><second/></first></root>');
simplexml_deep_set($xml, "first/second/third", "The XML Node Value");
simplexml_deep_set(
$xml, "How/do/I/dynamically/create/a/php/simplexml/object/while/keeping/current/properties"
, "The other XML Node Value"
);
$xml->asXML('php://output');
Example-Output (beautified):
<?xml version="1.0"?>
<root>
<first>
<second>
<third>The XML Node Value</third>
</second>
</first>
<How>
<do>
<I>
<dynamically>
<create>
<a>
<php>
<simplexml>
<object>
<while>
<keeping>
<current>
<properties>The other XML Node Value</properties>
</current>
</keeping>
</while>
</object>
</simplexml>
</php>
</a>
</create>
</dynamically>
</I>
</do>
</How>
</root>
See it in action.

Remove parent element, keep all inner children in DOMDocument with saveHTML

I'm manipulating a short HTML snippet with XPath; when I output the changed snippet back with $doc->saveHTML(), DOCTYPE gets added, and HTML / BODY tags wrap the output. I want to remove those, but keep all the children inside by only using the DOMDocument functions. For example:
$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<img src="http://" alt="">
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
echo htmlentities( $doc->saveHTML() );
This produces:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ...>
<html><body>
<p><strong>Title...</strong></p>
<img src="http://" alt="">
<p>...to be one of those crowning achievements...</p>
</body></html>
I've attempted some of the simple tricks, such as:
# removes doctype
$doc->removeChild($doc->firstChild);
# <body> replaces <html>
$doc->replaceChild($doc->firstChild->firstChild, $doc->firstChild);
So far that only removes DOCTYPE and replaces HTML with BODY. However, what remains is body > variable number of elements at this point.
How do I remove the <body> tag but keep all of its children, given that they will be structured variably, in a neat - clean way with PHP's DOM manipulation?
UPDATE
Here's a version that doesn't extend DOMDocument, though I think extending is the proper approach, since you're trying to achieve functionality that isn't built-in to the DOM API.
Note: I'm interpreting "clean" and "without workarounds" as keeping all manipulation to the DOM API. As soon as you hit string manipulation, that's workaround territory.
What I'm doing, just as in the original answer, is leveraging DOMDocumentFragment to manipulate multiple nodes all sitting at the root level. There is no string manipulation going on, which to me qualifies as not being a workaround.
$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p><img src="http://" alt=""><p>...to be one of those crowning achievements...</p>');
// Remove doctype node
$doc->doctype->parentNode->removeChild($doc->doctype);
// Remove html element, preserving child nodes
$html = $doc->getElementsByTagName("html")->item(0);
$fragment = $doc->createDocumentFragment();
while ($html->childNodes->length > 0) {
$fragment->appendChild($html->childNodes->item(0));
}
$html->parentNode->replaceChild($fragment, $html);
// Remove body element, preserving child nodes
$body = $doc->getElementsByTagName("body")->item(0);
$fragment = $doc->createDocumentFragment();
while ($body->childNodes->length > 0) {
$fragment->appendChild($body->childNodes->item(0));
}
$body->parentNode->replaceChild($fragment, $body);
// Output results
echo htmlentities($doc->saveHTML());
ORIGINAL ANSWER
This solution is rather lengthy, but it's because it goes about it by extending the DOM in order to keep your end code as short as possible.
sliceOutNode is where the magic happens. Let me know if you have any questions:
<?php
class DOMDocumentExtended extends DOMDocument
{
public function __construct( $version = "1.0", $encoding = "UTF-8" )
{
parent::__construct( $version, $encoding );
$this->registerNodeClass( "DOMElement", "DOMElementExtended" );
}
// This method will need to be removed once PHP supports LIBXML_NOXMLDECL
public function saveXML( DOMNode $node = NULL, $options = 0 )
{
$xml = parent::saveXML( $node, $options );
if( $options & LIBXML_NOXMLDECL )
{
$xml = $this->stripXMLDeclaration( $xml );
}
return $xml;
}
public function stripXMLDeclaration( $xml )
{
return preg_replace( "|<\?xml(.+?)\?>[\n\r]?|i", "", $xml );
}
}
class DOMElementExtended extends DOMElement
{
public function sliceOutNode()
{
$nodeList = new DOMNodeListExtended( $this->childNodes );
$this->replaceNodeWithNode( $nodeList->toFragment( $this->ownerDocument ) );
}
public function replaceNodeWithNode( DOMNode $node )
{
return $this->parentNode->replaceChild( $node, $this );
}
}
class DOMNodeListExtended extends ArrayObject
{
public function __construct( $mixedNodeList )
{
parent::__construct( array() );
$this->setNodeList( $mixedNodeList );
}
private function setNodeList( $mixedNodeList )
{
if( $mixedNodeList instanceof DOMNodeList )
{
$this->exchangeArray( array() );
foreach( $mixedNodeList as $node )
{
$this->append( $node );
}
}
elseif( is_array( $mixedNodeList ) )
{
$this->exchangeArray( $mixedNodeList );
}
else
{
throw new DOMException( "DOMNodeListExtended only supports a DOMNodeList or array as its constructor parameter." );
}
}
public function toFragment( DOMDocument $contextDocument )
{
$fragment = $contextDocument->createDocumentFragment();
foreach( $this as $node )
{
$fragment->appendChild( $contextDocument->importNode( $node, true ) );
}
return $fragment;
}
// Built-in methods of the original DOMNodeList
public function item( $index )
{
return $this->offsetGet( $index );
}
public function __get( $name )
{
switch( $name )
{
case "length":
return $this->count();
break;
}
return false;
}
}
// Load HTML/XML using our fancy DOMDocumentExtended class
$doc = new DOMDocumentExtended();
$doc->loadHTML('<p><strong>Title...</strong></p><img src="http://" alt=""><p>...to be one of those crowning achievements...</p>');
// Remove doctype node
$doc->doctype->parentNode->removeChild( $doc->doctype );
// Slice out html node
$html = $doc->getElementsByTagName("html")->item(0);
$html->sliceOutNode();
// Slice out body node
$body = $doc->getElementsByTagName("body")->item(0);
$body->sliceOutNode();
// Pick your poison: XML or HTML output
echo htmlentities( $doc->saveXML( NULL, LIBXML_NOXMLDECL ) );
echo htmlentities( $doc->saveHTML() );
saveHTML can output a subset of document, meaning we can ask it to output every child node one by one, by traversing body.
$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<img src="http://google.com/img.jpeg" alt="">
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
// Let's traverse the body and output every child node
$bodyNode = $doc->getElementsByTagName('body')->item(0);
foreach ($bodyNode->childNodes as $childNode) {
echo $doc->saveHTML($childNode);
}
This might not be a most elegant solution, but it works. Alternatively, we can wrap all children nodes inside some container element (say a div) and output only that container (but container tag will be included in the output).
Here how I've done it:
-- Quick helper function that gives you HTML contents for specific DOM element
function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
-- Find body node in your doc and get its contents
$query = $xpath->query("//body")->item(0);
if($query)
{
echo nodeContent($query);
}
UPDATE 1:
Some extra info: Since PHP/5.3.6, DOMDocument->saveHTML() accepts an optional DOMNode parameter similarly to DOMDocument->saveXML(). You can do
$xpath = new DOMXPath($doc);
$query = $xpath->query("//body")->item(0);
echo $doc->saveHTML($query);
for others, the helper function will help
tl;dr
requires: PHP 5.4.0 and Libxml 2.6.0
$doc->loadHTML("<p>test</p>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
explanation
http://php.net/manual/en/domdocument.loadhtml.php
"Since PHP 5.4.0 and Libxml 2.6.0, you may also use the options parameter to specify additional Libxml parameters."
LIBXML_HTML_NOIMPLIED Sets HTML_PARSE_NOIMPLIED flag, which turns off the automatic adding of implied html/body... elements.
LIBXML_HTML_NODEFDTD Sets HTML_PARSE_NODEFDTD flag, which prevents a default doctype being added when one is not found.
You have 2 ways to accomplish this:
$content = substr($content, strpos($content, '<html><body>') + 12); // Remove Everything Before & Including The Opening HTML & Body Tags.
$content = substr($content, 0, -14); // Remove Everything After & Including The Closing HTML & Body Tags.
Or even better is this way:
$dom->normalizeDocument();
$content = $dom->saveHTML();

Categories