Preventing DOMDocument::loadHTML() from converting entities - php

I have a string value that I'm trying to extract list items for. I'd like to extract the text and any subnodes, however, DOMDocument is converting the entities to the character, instead of leaving in the original state.
I've tried setting DOMDocument::resolveExternals and DOMDocument::substituteEntities for false, but this has no effect. It should be noted I'm running on Win7 with PHP 5.2.17.
Example code is:
$example = '<ul><li>text</li>'.
'<li>½ of this is <strong>strong</strong></li></ul>';
echo 'To be converted:'.PHP_EOL.$example.PHP_EOL;
$doc = new DOMDocument();
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadHTML($example);
$domNodeList = $doc->getElementsByTagName('li');
$count = $domNodeList->length;
for ($idx = 0; $idx < $count; $idx++) {
$value = trim(_get_inner_html($domNodeList->item($idx)));
/* remainder of processing and storing in database */
echo 'Saved '.$value.PHP_EOL;
}
function _get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
½ ends up getting converted to ½ (single character / UTF-8 version, not entity version), which is not the desired format.

Solution for not PHP 5.3.6++
$html =<<<HTML
<ul><li>text</li>
<li>½ of this is <strong>strong</strong></li></ul>
HTML;
$doc = new DOMDocument();
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('li') as $node)
{
echo htmlentities(iconv('UTF-8', 'ISO-8859-1', $node->nodeValue)), "\n";
}

Based on the answer provided by ajreal, I've expanded the example variable to handle more cases, and changed _get_inner_html() to make recursive calls and handle the entity conversion for text nodes.
It's probably not the best answer, since it makes some assumptions about the elements (such as no attributes). But since my particular needs don't require attributes to be carried across (yet.. I'm sure my sample data will throw that one at me later on), this solution works for me.
$example = '<ul><li>text</li>'.
'<li>½ of this is <strong>strong</strong></li>'.
'<li>Entity <strong attr="3">in ½ tag</strong></li>'.
'<li>Nested nodes <strong attr="3">in ½ <em>tag ½</em></strong></li>'.
'</ul>';
echo 'To be converted:'.PHP_EOL.$example.PHP_EOL;
$doc = new DOMDocument();
$doc->resolveExternals = true;
$doc->substituteEntities = false;
$doc->loadHTML($example);
$domNodeList = $doc->getElementsByTagName('li');
$count = $domNodeList->length;
for ($idx = 0; $idx < $count; $idx++) {
$value = trim(_get_inner_html($domNodeList->item($idx)));
/* remainder of processing and storing in database */
echo 'Saved '.$value.PHP_EOL;
}
function _get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
echo 'Node type is '.$child->nodeType.PHP_EOL;
switch ($child->nodeType) {
case 3:
$innerHTML .= htmlentities(iconv('UTF-8', 'ISO-8859-1', $child->nodeValue));
break;
default:
echo 'Non text node has '.$child->childNodes->length.' children'.PHP_EOL;
echo 'Node name '.$child->nodeName.PHP_EOL;
$innerHTML .= '<'.$child->nodeName.'>';
$innerHTML .= _get_inner_html( $child );
$innerHTML .= '</'.$child->nodeName.'>';
break;
}
}
return $innerHTML;
}

Need no iterate child nodes:
function innerHTML($node)
{$html=$node->ownerDocument->saveXML($node);
return preg_replace("%^<{$node->nodeName}[^>]*>|</{$node->nodeName}>$%", '', $html);
}

Related

Extract Button Text Using PHP [duplicate]

This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 5 years ago.
How to Change innerHTML of a php DOMElement ?
Another solution:
1) create new DOMDocumentFragment from the HTML string to be inserted;
2) remove old content of our element by deleting its child nodes;
3) append DOMDocumentFragment to our element.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
while ($element->hasChildNodes())
$element->removeChild($element->firstChild);
$element->appendChild($fragment);
}
Alternatively, we can replace our element with its clean copy and then append DOMDocumentFragment to this clone.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
$clone = $element->cloneNode(); // Get element copy without children
$clone->appendChild($fragment);
$element->parentNode->replaceChild($clone, $element);
}
Test:
$doc = new DOMDocument();
$doc->loadXML('<div><span style="color: green">Old HTML</span></div>');
$div = $doc->getElementsByTagName('div')->item(0);
echo $doc->saveHTML();
setInnerHTML($div, '<p style="color: red">New HTML</p>');
echo $doc->saveHTML();
// Output:
// <div><span style="color: green">Old HTML</span></div>
// <div><p style="color: red">New HTML</p></div>
I needed to do this for a project recently and ended up with an extension to DOMElement: http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/
Here's an example showing how it's used:
<?php
require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);
// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'
// set innerHTML
$elem->innerHTML = 'FF';
// print document (with our changes)
echo $doc->saveXML();
?>
I think the best thing you can do is come up with a function that will take the DOMElement that you want to change the InnerHTML of, copy it, and replace it.
In very rough PHP:
function replaceElement($el, $newInnerHTML) {
$newElement = $myDomDocument->createElement($el->nodeName, $newInnerHTML);
$el->parentNode->insertBefore($newElement, $el);
$el->parentNode->removeChild($el);
return $newElement;
}
This doesn't take into account attributes and nested structures, but I think this will get you on your way.
I ended up making this function using a few functions from other people on this page. I changed the one from Joanna Goch the way that Peter Brand says mostly, and also added some code from Guest and from other places.
This function does not use an extension, and does not use appendXML (which is very picky and breaks even if it sees one BR tag that is not closed) and seems to be working good.
function set_inner_html( $element, $content ) {
$DOM_inner_HTML = new DOMDocument();
$internal_errors = libxml_use_internal_errors( true );
$DOM_inner_HTML->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ) );
libxml_use_internal_errors( $internal_errors );
$content_node = $DOM_inner_HTML->getElementsByTagName('body')->item(0);
$content_node = $element->ownerDocument->importNode( $content_node, true );
while ( $element->hasChildNodes() ) {
$element->removeChild( $element->firstChild );
}
$element->appendChild( $content_node );
}
It seems that appendXML doesn't work always - for example if you try to append XML with 3 levels. Here is the function I wrote that always work (you want to set $content as innerHTML to $element):
function setInnerHTML($DOM, $element, $content) {
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0)->firstChild;
$contentNode = $DOM->importNode($contentNode, true);
$element->appendChild($contentNode);
return $elementNode;
}
Have a look at this library PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It looks pretty straightforward. You can change innertextproperty of your elements. It might help.
Here is a replace by class function I just wrote:
It will replace the innerHtml of a class. You can also specify the node type eg. div/p/a etc.
function replaceInnerHtmlByClass($html, $replace=null, $class=null, $nodeType=null){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
while($node->childNodes->length){
$node->removeChild($node->firstChild);
}
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($replace);
$node->appendChild($fragment);
}
return $dom->saveHTML($dom->documentElement);
}
Here is another function I wrote to remove nodes with a specific class but preserving the inner html.
Setting replace to true will discard the inner html.
Setting replace to any other content will replace the inner html with the provided content.
function stripTagsByClass($html, $class=null, $nodeType=null, $replace=false){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
$innerHTML = '';
$children = $node->childNodes;
foreach($children as $child) {
$tmp = new DOMDocument();
$tmp->appendChild($tmp->importNode($child,true));
$innerHTML .= $tmp->saveHTML();
}
$fragment = $dom->createDocumentFragment();
if($replace !== null && $replace !== false){
if($replace === true){ $replace = ''; }
$innerHTML = $replace;
}
$fragment->appendXML($innerHTML);
$node->parentNode->replaceChild($fragment, $node);
}
return $dom->saveHTML($dom->documentElement);
}
Theses functions can easily be adapted to use other attributes as the selector.
I only needed it to evaluate the class attribute.
Developing on from Joanna Goch's answer, this function will insert either a text node or an HTML fragment:
function nodeFromContent($node, $content) {
//creates a text node, or dom node if content contains html
$lt = strpos($content, '<');
$gt = strrpos($content, '>');
if (!($lt === false || $gt === false) && $gt > $lt) {
//< followed by > means potentially contains HTML
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0);
$newNode = $node->ownerDocument->importNode($contentNode, true);
} else {
$newNode = $node->ownerDocument->createTextNode($content);
}
return $newNode;
}
usage
$newNode = nodeFromContent($node, $content);
$node->parentNode->insertBefore($newNode, $node);
//or $node->appendChild($newNode) depending on what you require
here is how you do it:
$doc = new DOMDocument('');
$label = $doc->createElement('label');
$label->appendChild($doc->createTextNode('test'));
$li->appendChild($label);
echo $doc->saveHTML();
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}

Remove empty tags from a XML with PHP

Question
How can I remove empty xml tags in PHP?
Example:
$value1 = "2";
$value2 = "4";
$value3 = "";
xml = '<parentnode>
<tag1> ' .$value1. '</tag1>
<tag2> ' .$value2. '</tag2>
<tag3> ' .$value3. '</tag3>
</parentnode>';
XML Result:
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag3></tag3> // <- Empty tag
</parentnode>
What I want!
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
</parentnode>
The XML without the empty tags like "tag3"
Thanks!
You can use XPath with the predicate not(node()) to select all elements that do not have child nodes.
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
</parentnode>');
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$doc->formatOutput = true;
echo $doc->savexml();
prints
<?xml version="1.0"?>
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag2>4</tag2>
<tag2>4</tag2>
</parentnode>
This works recursively and removes nodes that:
contain only spaces
do not have attributes
do not have child notes
// not(*) does not have children elements
// not(#*) does not have attributes
// text()[normalize-space()] nodes that include whitespace text
while (($node_list = $xpath->query('//*[not(*) and not(#*) and not(text()[normalize-space()])]')) && $node_list->length) {
foreach ($node_list as $node) {
$node->parentNode->removeChild($node);
}
}
$dom = new DOMDocument;
$dom->loadXML($xml);
$elements = $dom->getElementsByTagName('*');
foreach($elements as $element) {
if ( ! $element->hasChildNodes() OR $element->nodeValue == '') {
$element->parentNode->removeChild($element);
}
}
echo $dom->saveXML();
CodePad.
The solution that worked with my production PHP SimpleXMLElement object code, by using Xpath, was:
/*
* Remove empty (no children) and blank (no text) XML element nodes, but not an empty root element (/child::*).
* This does not work recursively; meaning after empty child elements are removed, parents are not reexamined.
*/
foreach( $this->xml->xpath('/child::*//*[not(*) and not(text()[normalize-space()])]') as $emptyElement ) {
unset( $emptyElement[0] );
}
Note that it is not required to use PHP DOM, DOMDocument, DOMXPath, or dom_import_simplexml().
//this is a recursively option
do {
$removed = false;
foreach( $this->xml->xpath('/child::*//*[not(*) and not(text()[normalize-space()])]') as $emptyElement ) {
unset( $emptyElement[0] );
$removed = true;
}
} while ($removed) ;
If you're going to be a lot of this, just do something like:
$value[] = "2";
$value[] = "4";
$value[] = "";
$xml = '<parentnode>';
for($i=1,$m=count($value); $i<$m+1; $i++)
$xml .= !empty($value[$i-1]) ? "<tag{$i}>{$value[$i-1]}</tag{$i}>" : null;
$xml .= '</parentnode>';
echo $xml;
Ideally though, you should probably use domdocument.

adding rel="nofollow" while saving data

I have my application to allow users to write comments on my website. Its working fine. I also have tool to insert their weblinks in it. I feel good with contents with their own weblinks.
Now i want to add rel="nofollow" to every links on content that they have been written.
I would like to add rel="nofollow" using php i.e while saving data.
So what's a simple method to add rel="nofollow" or updated rel="someother" with rel="someother nofollow" using php
a nice example will be much efficient
Regexs really aren't the best tool for dealing with HTML, especially when PHP has a pretty good HTML parser built in.
This code will handle adding nofollow if the rel attribute is already populated.
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
$rel = array();
if ($anchor->hasAttribute('rel') AND ($relAtt = $anchor->getAttribute('rel')) !== '') {
$rel = preg_split('/\s+/', trim($relAtt));
}
if (in_array('nofollow', $rel)) {
continue;
}
$rel[] = 'nofollow';
$anchor->setAttribute('rel', implode(' ', $rel));
}
var_dump($dom->saveHTML());
CodePad.
The resulting HTML is in $dom->saveHTML(). Except it will wrap it with html, body elements, etc, so use this to extract just the HTML you entered...
$html = '';
foreach($dom->getElementsByTagName('body')->item(0)->childNodes as $element) {
$html .= $dom->saveXML($element, LIBXML_NOEMPTYTAG);
}
echo $html;
If you have >= PHP 5.3, replace saveXML() with saveHTML() and drop the second argument.
Example
This HTML...
hello
hello
hello
hello
...is converted into...
hello
hello
hello
hello
Good Alex. If it is in the form of a function it is more useful. So I made it below:
function add_no_follow($str){
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
$rel = array();
if ($anchor->hasAttribute('rel') AND ($relAtt = $anchor->getAttribute('rel')) !== '') {
$rel = preg_split('/\s+/', trim($relAtt));
}
if (in_array('nofollow', $rel)) {
continue;
}
$rel[] = 'nofollow';
$anchor->setAttribute('rel', implode(' ', $rel));
}
$dom->saveHTML();
$html = '';
foreach($dom->getElementsByTagName('body')->item(0)->childNodes as $element) {
$html .= $dom->saveXML($element, LIBXML_NOEMPTYTAG);
}
return $html;
}
Use as follows :
$str = "Some content with link Some content ... ";
$str = add_no_follow($str);
I've copied Alex's answer and made it into a function that makes links nofollow and open in a new tab/window (and added UTF-8 support). I'm not sure if this is the best way to do this, but it works (constructive input is welcome):
function nofollow_new_window($str)
{
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor)
{
$rel = array();
if ($anchor->hasAttribute('rel') AND ($relAtt = $anchor->getAttribute('rel')) !== '') {
$rel = preg_split('/\s+/', trim($relAtt));
}
if (in_array('nofollow', $rel)) {
continue;
}
$rel[] = 'nofollow';
$anchor->setAttribute('rel', implode(' ', $rel));
$target = array();
if ($anchor->hasAttribute('target') AND ($relAtt = $anchor->getAttribute('target')) !== '') {
$target = preg_split('/\s+/', trim($relAtt));
}
if (in_array('_blank', $target)) {
continue;
}
$target[] = '_blank';
$anchor->setAttribute('target', implode(' ', $target));
}
$str = utf8_decode($dom->saveHTML($dom->documentElement));
return $str;
}
Simply use the function like this:
$str = '<html><head></head><body>fdsafffffdfsfdffff dfsdaff flkklfd aldsfklffdssfdfds Google</body></html>';
$str = nofollow_new_window($str);
echo $str;

Change innerHTML of a php DOMElement [duplicate]

This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 5 years ago.
How to Change innerHTML of a php DOMElement ?
Another solution:
1) create new DOMDocumentFragment from the HTML string to be inserted;
2) remove old content of our element by deleting its child nodes;
3) append DOMDocumentFragment to our element.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
while ($element->hasChildNodes())
$element->removeChild($element->firstChild);
$element->appendChild($fragment);
}
Alternatively, we can replace our element with its clean copy and then append DOMDocumentFragment to this clone.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
$clone = $element->cloneNode(); // Get element copy without children
$clone->appendChild($fragment);
$element->parentNode->replaceChild($clone, $element);
}
Test:
$doc = new DOMDocument();
$doc->loadXML('<div><span style="color: green">Old HTML</span></div>');
$div = $doc->getElementsByTagName('div')->item(0);
echo $doc->saveHTML();
setInnerHTML($div, '<p style="color: red">New HTML</p>');
echo $doc->saveHTML();
// Output:
// <div><span style="color: green">Old HTML</span></div>
// <div><p style="color: red">New HTML</p></div>
I needed to do this for a project recently and ended up with an extension to DOMElement: http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/
Here's an example showing how it's used:
<?php
require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);
// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'
// set innerHTML
$elem->innerHTML = 'FF';
// print document (with our changes)
echo $doc->saveXML();
?>
I think the best thing you can do is come up with a function that will take the DOMElement that you want to change the InnerHTML of, copy it, and replace it.
In very rough PHP:
function replaceElement($el, $newInnerHTML) {
$newElement = $myDomDocument->createElement($el->nodeName, $newInnerHTML);
$el->parentNode->insertBefore($newElement, $el);
$el->parentNode->removeChild($el);
return $newElement;
}
This doesn't take into account attributes and nested structures, but I think this will get you on your way.
I ended up making this function using a few functions from other people on this page. I changed the one from Joanna Goch the way that Peter Brand says mostly, and also added some code from Guest and from other places.
This function does not use an extension, and does not use appendXML (which is very picky and breaks even if it sees one BR tag that is not closed) and seems to be working good.
function set_inner_html( $element, $content ) {
$DOM_inner_HTML = new DOMDocument();
$internal_errors = libxml_use_internal_errors( true );
$DOM_inner_HTML->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ) );
libxml_use_internal_errors( $internal_errors );
$content_node = $DOM_inner_HTML->getElementsByTagName('body')->item(0);
$content_node = $element->ownerDocument->importNode( $content_node, true );
while ( $element->hasChildNodes() ) {
$element->removeChild( $element->firstChild );
}
$element->appendChild( $content_node );
}
It seems that appendXML doesn't work always - for example if you try to append XML with 3 levels. Here is the function I wrote that always work (you want to set $content as innerHTML to $element):
function setInnerHTML($DOM, $element, $content) {
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0)->firstChild;
$contentNode = $DOM->importNode($contentNode, true);
$element->appendChild($contentNode);
return $elementNode;
}
Have a look at this library PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It looks pretty straightforward. You can change innertextproperty of your elements. It might help.
Here is a replace by class function I just wrote:
It will replace the innerHtml of a class. You can also specify the node type eg. div/p/a etc.
function replaceInnerHtmlByClass($html, $replace=null, $class=null, $nodeType=null){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
while($node->childNodes->length){
$node->removeChild($node->firstChild);
}
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($replace);
$node->appendChild($fragment);
}
return $dom->saveHTML($dom->documentElement);
}
Here is another function I wrote to remove nodes with a specific class but preserving the inner html.
Setting replace to true will discard the inner html.
Setting replace to any other content will replace the inner html with the provided content.
function stripTagsByClass($html, $class=null, $nodeType=null, $replace=false){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
$innerHTML = '';
$children = $node->childNodes;
foreach($children as $child) {
$tmp = new DOMDocument();
$tmp->appendChild($tmp->importNode($child,true));
$innerHTML .= $tmp->saveHTML();
}
$fragment = $dom->createDocumentFragment();
if($replace !== null && $replace !== false){
if($replace === true){ $replace = ''; }
$innerHTML = $replace;
}
$fragment->appendXML($innerHTML);
$node->parentNode->replaceChild($fragment, $node);
}
return $dom->saveHTML($dom->documentElement);
}
Theses functions can easily be adapted to use other attributes as the selector.
I only needed it to evaluate the class attribute.
Developing on from Joanna Goch's answer, this function will insert either a text node or an HTML fragment:
function nodeFromContent($node, $content) {
//creates a text node, or dom node if content contains html
$lt = strpos($content, '<');
$gt = strrpos($content, '>');
if (!($lt === false || $gt === false) && $gt > $lt) {
//< followed by > means potentially contains HTML
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0);
$newNode = $node->ownerDocument->importNode($contentNode, true);
} else {
$newNode = $node->ownerDocument->createTextNode($content);
}
return $newNode;
}
usage
$newNode = nodeFromContent($node, $content);
$node->parentNode->insertBefore($newNode, $node);
//or $node->appendChild($newNode) depending on what you require
here is how you do it:
$doc = new DOMDocument('');
$label = $doc->createElement('label');
$label->appendChild($doc->createTextNode('test'));
$li->appendChild($label);
echo $doc->saveHTML();
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}

How to replace text in HTML

From this question: What regex pattern do I need for this? I've been using the following code:
function process($node, $replaceRules) {
if($node->hasChildNodes()) {
foreach ($node->childNodes as $childNode) {
if ($childNode instanceof DOMText) {
$text = preg_replace(
array_keys($replaceRules),
array_values($replaceRules),
$childNode->wholeText
);
$node->replaceChild(new DOMText($text),$childNode);
} else {
process($childNode, $replaceRules);
}
}
}
}
$replaceRules = array(
'/\b(c|C)olor\b/' => '$1olour',
'/\b(kilom|Kilom|M|m)eter/' => '$1etre',
);
$htmlString = "<p><span style='color:red'>The color of the sky is: gray</p>";
$doc = new DOMDocument();
$doc->loadHtml($htmlString);
process($doc, $replaceRules);
$string = $doc->saveHTML();
echo mb_substr($string,119,-15);
It works fine, but it fails (as the child node is replaced on the first instance) if the html has text and HTML. So it works on
<div>The distance is four kilometers</div>
but not
<div>The distance is four kilometers<br>1000 meters to a kilometer</div>
or
<div>The distance is four kilometers<div class="guide">1000 meters to a kilometer</div></div>
Any ideas of a method that would work on such examples?
Calling $node->replaceChild will confuse the $node->childNodes iterator. You can get the child nodes first, and then process them:
function process($node, $replaceRules) {
if($node->hasChildNodes()) {
$nodes = array();
foreach ($node->childNodes as $childNode) {
$nodes[] = $childNode;
}
foreach ($nodes as $childNode) {
if ($childNode instanceof DOMText) {
$text = preg_replace(
array_keys($replaceRules),
array_values($replaceRules),
$childNode->wholeText);
$node->replaceChild(new DOMText($text),$childNode);
}
else {
process($childNode, $replaceRules);
}
}
}
}

Categories