Extract Button Text Using PHP [duplicate] - php

This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 5 years ago.
How to Change innerHTML of a php DOMElement ?

Another solution:
1) create new DOMDocumentFragment from the HTML string to be inserted;
2) remove old content of our element by deleting its child nodes;
3) append DOMDocumentFragment to our element.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
while ($element->hasChildNodes())
$element->removeChild($element->firstChild);
$element->appendChild($fragment);
}
Alternatively, we can replace our element with its clean copy and then append DOMDocumentFragment to this clone.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
$clone = $element->cloneNode(); // Get element copy without children
$clone->appendChild($fragment);
$element->parentNode->replaceChild($clone, $element);
}
Test:
$doc = new DOMDocument();
$doc->loadXML('<div><span style="color: green">Old HTML</span></div>');
$div = $doc->getElementsByTagName('div')->item(0);
echo $doc->saveHTML();
setInnerHTML($div, '<p style="color: red">New HTML</p>');
echo $doc->saveHTML();
// Output:
// <div><span style="color: green">Old HTML</span></div>
// <div><p style="color: red">New HTML</p></div>

I needed to do this for a project recently and ended up with an extension to DOMElement: http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/
Here's an example showing how it's used:
<?php
require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);
// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'
// set innerHTML
$elem->innerHTML = 'FF';
// print document (with our changes)
echo $doc->saveXML();
?>

I think the best thing you can do is come up with a function that will take the DOMElement that you want to change the InnerHTML of, copy it, and replace it.
In very rough PHP:
function replaceElement($el, $newInnerHTML) {
$newElement = $myDomDocument->createElement($el->nodeName, $newInnerHTML);
$el->parentNode->insertBefore($newElement, $el);
$el->parentNode->removeChild($el);
return $newElement;
}
This doesn't take into account attributes and nested structures, but I think this will get you on your way.

I ended up making this function using a few functions from other people on this page. I changed the one from Joanna Goch the way that Peter Brand says mostly, and also added some code from Guest and from other places.
This function does not use an extension, and does not use appendXML (which is very picky and breaks even if it sees one BR tag that is not closed) and seems to be working good.
function set_inner_html( $element, $content ) {
$DOM_inner_HTML = new DOMDocument();
$internal_errors = libxml_use_internal_errors( true );
$DOM_inner_HTML->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ) );
libxml_use_internal_errors( $internal_errors );
$content_node = $DOM_inner_HTML->getElementsByTagName('body')->item(0);
$content_node = $element->ownerDocument->importNode( $content_node, true );
while ( $element->hasChildNodes() ) {
$element->removeChild( $element->firstChild );
}
$element->appendChild( $content_node );
}

It seems that appendXML doesn't work always - for example if you try to append XML with 3 levels. Here is the function I wrote that always work (you want to set $content as innerHTML to $element):
function setInnerHTML($DOM, $element, $content) {
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0)->firstChild;
$contentNode = $DOM->importNode($contentNode, true);
$element->appendChild($contentNode);
return $elementNode;
}

Have a look at this library PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It looks pretty straightforward. You can change innertextproperty of your elements. It might help.

Here is a replace by class function I just wrote:
It will replace the innerHtml of a class. You can also specify the node type eg. div/p/a etc.
function replaceInnerHtmlByClass($html, $replace=null, $class=null, $nodeType=null){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
while($node->childNodes->length){
$node->removeChild($node->firstChild);
}
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($replace);
$node->appendChild($fragment);
}
return $dom->saveHTML($dom->documentElement);
}
Here is another function I wrote to remove nodes with a specific class but preserving the inner html.
Setting replace to true will discard the inner html.
Setting replace to any other content will replace the inner html with the provided content.
function stripTagsByClass($html, $class=null, $nodeType=null, $replace=false){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
$innerHTML = '';
$children = $node->childNodes;
foreach($children as $child) {
$tmp = new DOMDocument();
$tmp->appendChild($tmp->importNode($child,true));
$innerHTML .= $tmp->saveHTML();
}
$fragment = $dom->createDocumentFragment();
if($replace !== null && $replace !== false){
if($replace === true){ $replace = ''; }
$innerHTML = $replace;
}
$fragment->appendXML($innerHTML);
$node->parentNode->replaceChild($fragment, $node);
}
return $dom->saveHTML($dom->documentElement);
}
Theses functions can easily be adapted to use other attributes as the selector.
I only needed it to evaluate the class attribute.

Developing on from Joanna Goch's answer, this function will insert either a text node or an HTML fragment:
function nodeFromContent($node, $content) {
//creates a text node, or dom node if content contains html
$lt = strpos($content, '<');
$gt = strrpos($content, '>');
if (!($lt === false || $gt === false) && $gt > $lt) {
//< followed by > means potentially contains HTML
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0);
$newNode = $node->ownerDocument->importNode($contentNode, true);
} else {
$newNode = $node->ownerDocument->createTextNode($content);
}
return $newNode;
}
usage
$newNode = nodeFromContent($node, $content);
$node->parentNode->insertBefore($newNode, $node);
//or $node->appendChild($newNode) depending on what you require

here is how you do it:
$doc = new DOMDocument('');
$label = $doc->createElement('label');
$label->appendChild($doc->createTextNode('test'));
$li->appendChild($label);
echo $doc->saveHTML();

function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}

Related

PHP DOM XML retrieve mixed content [duplicate]

What function do you use to get innerHTML of a given DOMNode in the PHP DOM implementation? Can someone give reliable solution?
Of course outerHTML will do too.
Compare this updated variant with PHP Manual User Note #89718:
<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
?>
Example:
<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($html_string);
$domTables = $dom->getElementsByTagName("table");
// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>
Here is a version in a functional programming style:
function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}
To return the html of an element, you can use C14N():
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}
A simplified version of Haim Evgi's answer:
<?php
function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;
$html = '';
foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}
return $html;
}
Example usage:
<?php
$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");
print innerHTML($doc->getElementById('foo'));
/*
<p>This is <b>an <i>example</i></b> paragraph<br>
containing newlines.</p>
<p>This is another paragraph.</p>
*/
There's no need to set preserveWhiteSpace or formatOutput.
In addition to trincot's nice version with array_map and implode but this time with array_reduce:
return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);
Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.
function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}
Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.
InnerHTML
Let's define this helper function:
function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '#^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$#', '', $html );
}
where we can include/exclude the outer target tag through the second input argument.
Usage Example
Here we extract the inner HTML for a target tag given by the "first" id attribute:
$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );
if ( $node instanceof \DOMNode ) {
echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>
echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}
Live example:
http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8
Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>
For people who want to get the HTML from XPath query, here is my version:
$xpath = new DOMXpath( $my_dom_object );
$DOMNodeList = $xpath->query('//div[contains(#class, "some_custom_class_in_html")]');
if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}

PHP DOMDocument get image tag and value with getElementsByTagName [duplicate]

What function do you use to get innerHTML of a given DOMNode in the PHP DOM implementation? Can someone give reliable solution?
Of course outerHTML will do too.
Compare this updated variant with PHP Manual User Note #89718:
<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
?>
Example:
<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($html_string);
$domTables = $dom->getElementsByTagName("table");
// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>
Here is a version in a functional programming style:
function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}
To return the html of an element, you can use C14N():
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}
A simplified version of Haim Evgi's answer:
<?php
function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;
$html = '';
foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}
return $html;
}
Example usage:
<?php
$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");
print innerHTML($doc->getElementById('foo'));
/*
<p>This is <b>an <i>example</i></b> paragraph<br>
containing newlines.</p>
<p>This is another paragraph.</p>
*/
There's no need to set preserveWhiteSpace or formatOutput.
In addition to trincot's nice version with array_map and implode but this time with array_reduce:
return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);
Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.
function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}
Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.
InnerHTML
Let's define this helper function:
function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '#^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$#', '', $html );
}
where we can include/exclude the outer target tag through the second input argument.
Usage Example
Here we extract the inner HTML for a target tag given by the "first" id attribute:
$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );
if ( $node instanceof \DOMNode ) {
echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>
echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}
Live example:
http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8
Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>
For people who want to get the HTML from XPath query, here is my version:
$xpath = new DOMXpath( $my_dom_object );
$DOMNodeList = $xpath->query('//div[contains(#class, "some_custom_class_in_html")]');
if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}

How to parsing HTML Content [duplicate]

What function do you use to get innerHTML of a given DOMNode in the PHP DOM implementation? Can someone give reliable solution?
Of course outerHTML will do too.
Compare this updated variant with PHP Manual User Note #89718:
<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
?>
Example:
<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($html_string);
$domTables = $dom->getElementsByTagName("table");
// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>
Here is a version in a functional programming style:
function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}
To return the html of an element, you can use C14N():
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}
A simplified version of Haim Evgi's answer:
<?php
function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;
$html = '';
foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}
return $html;
}
Example usage:
<?php
$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");
print innerHTML($doc->getElementById('foo'));
/*
<p>This is <b>an <i>example</i></b> paragraph<br>
containing newlines.</p>
<p>This is another paragraph.</p>
*/
There's no need to set preserveWhiteSpace or formatOutput.
In addition to trincot's nice version with array_map and implode but this time with array_reduce:
return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);
Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.
function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}
Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.
InnerHTML
Let's define this helper function:
function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '#^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$#', '', $html );
}
where we can include/exclude the outer target tag through the second input argument.
Usage Example
Here we extract the inner HTML for a target tag given by the "first" id attribute:
$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );
if ( $node instanceof \DOMNode ) {
echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>
echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}
Live example:
http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8
Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>
For people who want to get the HTML from XPath query, here is my version:
$xpath = new DOMXpath( $my_dom_object );
$DOMNodeList = $xpath->query('//div[contains(#class, "some_custom_class_in_html")]');
if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}

php DOM write all child nodes [duplicate]

What function do you use to get innerHTML of a given DOMNode in the PHP DOM implementation? Can someone give reliable solution?
Of course outerHTML will do too.
Compare this updated variant with PHP Manual User Note #89718:
<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
?>
Example:
<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($html_string);
$domTables = $dom->getElementsByTagName("table");
// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>
Here is a version in a functional programming style:
function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}
To return the html of an element, you can use C14N():
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}
A simplified version of Haim Evgi's answer:
<?php
function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;
$html = '';
foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}
return $html;
}
Example usage:
<?php
$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");
print innerHTML($doc->getElementById('foo'));
/*
<p>This is <b>an <i>example</i></b> paragraph<br>
containing newlines.</p>
<p>This is another paragraph.</p>
*/
There's no need to set preserveWhiteSpace or formatOutput.
In addition to trincot's nice version with array_map and implode but this time with array_reduce:
return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);
Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.
function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}
Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.
InnerHTML
Let's define this helper function:
function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '#^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$#', '', $html );
}
where we can include/exclude the outer target tag through the second input argument.
Usage Example
Here we extract the inner HTML for a target tag given by the "first" id attribute:
$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );
if ( $node instanceof \DOMNode ) {
echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>
echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}
Live example:
http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8
Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>
For people who want to get the HTML from XPath query, here is my version:
$xpath = new DOMXpath( $my_dom_object );
$DOMNodeList = $xpath->query('//div[contains(#class, "some_custom_class_in_html")]');
if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}

Change innerHTML of a php DOMElement [duplicate]

This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 5 years ago.
How to Change innerHTML of a php DOMElement ?
Another solution:
1) create new DOMDocumentFragment from the HTML string to be inserted;
2) remove old content of our element by deleting its child nodes;
3) append DOMDocumentFragment to our element.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
while ($element->hasChildNodes())
$element->removeChild($element->firstChild);
$element->appendChild($fragment);
}
Alternatively, we can replace our element with its clean copy and then append DOMDocumentFragment to this clone.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
$clone = $element->cloneNode(); // Get element copy without children
$clone->appendChild($fragment);
$element->parentNode->replaceChild($clone, $element);
}
Test:
$doc = new DOMDocument();
$doc->loadXML('<div><span style="color: green">Old HTML</span></div>');
$div = $doc->getElementsByTagName('div')->item(0);
echo $doc->saveHTML();
setInnerHTML($div, '<p style="color: red">New HTML</p>');
echo $doc->saveHTML();
// Output:
// <div><span style="color: green">Old HTML</span></div>
// <div><p style="color: red">New HTML</p></div>
I needed to do this for a project recently and ended up with an extension to DOMElement: http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/
Here's an example showing how it's used:
<?php
require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);
// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'
// set innerHTML
$elem->innerHTML = 'FF';
// print document (with our changes)
echo $doc->saveXML();
?>
I think the best thing you can do is come up with a function that will take the DOMElement that you want to change the InnerHTML of, copy it, and replace it.
In very rough PHP:
function replaceElement($el, $newInnerHTML) {
$newElement = $myDomDocument->createElement($el->nodeName, $newInnerHTML);
$el->parentNode->insertBefore($newElement, $el);
$el->parentNode->removeChild($el);
return $newElement;
}
This doesn't take into account attributes and nested structures, but I think this will get you on your way.
I ended up making this function using a few functions from other people on this page. I changed the one from Joanna Goch the way that Peter Brand says mostly, and also added some code from Guest and from other places.
This function does not use an extension, and does not use appendXML (which is very picky and breaks even if it sees one BR tag that is not closed) and seems to be working good.
function set_inner_html( $element, $content ) {
$DOM_inner_HTML = new DOMDocument();
$internal_errors = libxml_use_internal_errors( true );
$DOM_inner_HTML->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ) );
libxml_use_internal_errors( $internal_errors );
$content_node = $DOM_inner_HTML->getElementsByTagName('body')->item(0);
$content_node = $element->ownerDocument->importNode( $content_node, true );
while ( $element->hasChildNodes() ) {
$element->removeChild( $element->firstChild );
}
$element->appendChild( $content_node );
}
It seems that appendXML doesn't work always - for example if you try to append XML with 3 levels. Here is the function I wrote that always work (you want to set $content as innerHTML to $element):
function setInnerHTML($DOM, $element, $content) {
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0)->firstChild;
$contentNode = $DOM->importNode($contentNode, true);
$element->appendChild($contentNode);
return $elementNode;
}
Have a look at this library PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It looks pretty straightforward. You can change innertextproperty of your elements. It might help.
Here is a replace by class function I just wrote:
It will replace the innerHtml of a class. You can also specify the node type eg. div/p/a etc.
function replaceInnerHtmlByClass($html, $replace=null, $class=null, $nodeType=null){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
while($node->childNodes->length){
$node->removeChild($node->firstChild);
}
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($replace);
$node->appendChild($fragment);
}
return $dom->saveHTML($dom->documentElement);
}
Here is another function I wrote to remove nodes with a specific class but preserving the inner html.
Setting replace to true will discard the inner html.
Setting replace to any other content will replace the inner html with the provided content.
function stripTagsByClass($html, $class=null, $nodeType=null, $replace=false){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
$innerHTML = '';
$children = $node->childNodes;
foreach($children as $child) {
$tmp = new DOMDocument();
$tmp->appendChild($tmp->importNode($child,true));
$innerHTML .= $tmp->saveHTML();
}
$fragment = $dom->createDocumentFragment();
if($replace !== null && $replace !== false){
if($replace === true){ $replace = ''; }
$innerHTML = $replace;
}
$fragment->appendXML($innerHTML);
$node->parentNode->replaceChild($fragment, $node);
}
return $dom->saveHTML($dom->documentElement);
}
Theses functions can easily be adapted to use other attributes as the selector.
I only needed it to evaluate the class attribute.
Developing on from Joanna Goch's answer, this function will insert either a text node or an HTML fragment:
function nodeFromContent($node, $content) {
//creates a text node, or dom node if content contains html
$lt = strpos($content, '<');
$gt = strrpos($content, '>');
if (!($lt === false || $gt === false) && $gt > $lt) {
//< followed by > means potentially contains HTML
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0);
$newNode = $node->ownerDocument->importNode($contentNode, true);
} else {
$newNode = $node->ownerDocument->createTextNode($content);
}
return $newNode;
}
usage
$newNode = nodeFromContent($node, $content);
$node->parentNode->insertBefore($newNode, $node);
//or $node->appendChild($newNode) depending on what you require
here is how you do it:
$doc = new DOMDocument('');
$label = $doc->createElement('label');
$label->appendChild($doc->createTextNode('test'));
$li->appendChild($label);
echo $doc->saveHTML();
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}

Categories