PHP DOMDocument - createDocumentFragment does not work with loadHTML - php

I have a string that contains HTML and I would like to insert this HTML in a DOMElement.
For that, I did:
$abstract = "<p xmlns:default="http://www.w3.org/1998/Math/MathML">Test string <formula type="inline"><default:math xmlns="http://www.w3.org/1998/Math/MathML"><default:mi>π</default:mi></default:math></formula></p>"
$dom = new \DOMDocument();
#$dom->loadHTML($abstract);
$frag = $dom->createDocumentFragment();
When var dumping the $frag->nodeValue, I am getting null. Any idea?

I am not sure what you expect, you creating a new fragment and you add no content. Even if you do it would not work because the document fragment is no node, it is an helper construct to add a XML fragment to a document.
Here is an example:
$dom = new \DOMDocument();
$body = $dom->appendChild($dom->createElement('body'));
$fragment = $dom->createDocumentFragment();
$fragment->appendXml('<p>first</p>second');
$body->appendChild($fragment);
echo $dom->saveHtml();
Output:
<body><p>first</p>second</body>

Related

print_r for nodeList is not working

I have the following source code:
<?php
function getTerms()
{
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML('https://charitablebookings.com/terms'); // loads your HTML
$xpath = new DOMXPath($doc);
// returns a list of all links with rel=nofollow
$nodeList = $xpath->query("//div[#class='terms-conditions']");
$temp_dom = new DOMDocument();
$node = $nodeList->item(0);
$temp_dom = new DOMDocument();
foreach($nodeList as $n) $temp_dom->appendChild($temp_dom->importNode($n,true));
print_r($temp_dom->saveHTML());
}
getTerms();
?>
which I'm trying to get a text from a web page by getting a specific class. I don't get anything on my browser when I try to print_r the temp_dom. And $node is null. What am I doing wrong ?
Thanks for your time
The first issue is that DOMDocument's loadHTML method expects HTML content as its first parameter, not an URL.
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html = file_get_contents('https://charitablebookings.com/terms');
$doc->loadHTML($html);
And the second problem is with your XPath expression: $xpath->query("//div[#class='terms-conditions']") - as there is no div with class of terms-conditions in the document (it probably gets added by some JavaScript loader).

PHP DOMDocument: Get inner HTML of node

When loading HTML into an <textarea>, I intend to treat different kinds of links differently. Consider the following links:
http://stackoverflow.com
StackOverflow
When the text inside a link matches its href attribute, I want to remove the HTML, otherwise the HTML remains unchanged.
Here's my code:
$body = "Some HTML with a http://stackoverflow.com";
$dom = new DOMDocument;
$dom->loadHTML($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('a') as $node) {
$link_text = $node->ownerDocument->saveHTML($node->childNodes[0]);
$link_href = $node->getAttribute("href");
$link_node = $dom->createTextNode($link_href);
$node->parentNode->replaceChild($link_node, $node);
}
$html = $dom->saveHTML();
The problem with the above code is that DOMDocument encapsulates my HTML into a paragraph tag:
<p>Some HTML with a http://stackoverflow.com</p>
How do I get it ot only return the inner HTML of that paragraph?
You need to have a root node to have a valid DOM document.
I suggest you to add a root node <div> to avoid to destroy a possibly existing one.
Finally, load the nodeValue of the rootNode or substr().
$body = "Some HTML with a http://stackoverflow.com";
$body = '<div>'.$body.'</div>';
$dom = new DOMDocument;
$dom->loadHTML($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('a') as $node) {
$link_text = $node->ownerDocument->saveHTML($node->childNodes[0]);
$link_href = $node->getAttribute("href");
$link_node = $dom->createTextNode($link_href);
$node->parentNode->replaceChild($link_node, $node);
}
// or probably better :
$html = $dom->saveHTML() ;
$html = substr($html,5,-7); // remove <div>
var_dump($html); // "Some HTML with a http://stackoverflow.com"
This works is the input string is :
<p>Some HTML with a http://stackoverflow.com</p>
outputs :
<p>Some HTML with a http://stackoverflow.com</p>

PHP: Appending (adding) html content to exsisting element by ID

I need to search for an element by ID using PHP then appending html content to it. It seems simple enough but I'm new to php and can't find the right function to use to do this.
$html = file_get_contents('http://example.com');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$descBox = $doc->getElementById('element1');
I just don't know how to do the next step. Any help would be appreciated.
Like chris mentioned in his comment try using DOMNode::appendChild, which will allow you to add a child element to your selected element and DOMDocument::createElement to actually create the element like so:
$html = file_get_contents('http://example.com');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('element1');
//create the element to append to #element1
$appended = $doc->createElement('div', 'This is a test element.');
//actually append the element
$descBox->appendChild($appended);
Alternatively if you already have an HTML string you want to append you can create a document fragment like so:
$html = file_get_contents('http://example.com');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('element1');
//create the fragment
$fragment = $doc->createDocumentFragment();
//add content to fragment
$fragment->appendXML('<div>This is a test element.</div>');
//actually append the element
$descBox->appendChild($fragment);
Please note that any elements added with JavaScript will be inaccessible to PHP.
you can also append this way
$html = '
<html>
<body>
<ul id="one">
<li>hello</li>
<li>hello2</li>
<li>hello3</li>
<li>hello4</li>
</ul>
</body>
</html>';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('one');
//create the element to append to #element1
$appended = $doc->createElement('li', 'This is a test element.');
//actually append the element
$descBox->appendChild($appended);
echo $doc->saveHTML();
dont forget to saveHTML on the last line

Replace content of node using PHP and XPath

I have a string of 'source html' and a string of 'replacement html'. In the 'source html' I want to look for a node with a specific class and replace its content with my 'replacement html'. I have tried using the replaceChild method, but this seems to require that I traverse a level up (parentNode).
This doesn't work
$dom = new DOMDocument;
$dom->loadXml($sourceHTML);
$replacement = $dom->createDocumentFragment();
$replacement->appendXML($replacementHTML);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[contains(#class,"arrangement--index__field-dato")]')->item(0);
$oldNode->replaceChild($replacement, $oldNode);
This works, but it's not the content which is being replaced
$dom = new DOMDocument;
$dom->loadXml($sourceHTML);
$replacement = $dom->createDocumentFragment();
$replacement->appendXML($replacementHTML);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[contains(#class,"arrangement--index__field-dato")]')->item(0);
$oldNode->parentNode->replaceChild($replacement, $oldNode);
How do I replace the content or the node I have queried for?
Instead of replacing the child node, loop over it's children, drop them and insert the new content as child node. Something like
foreach ($oldNode->childNodes as $child)
$oldNode->removeChild($child);
$oldNode->appendChild($replacement);
This will replace the contents (children) instead of the node itself.
This seems to work!
$dom = new DOMDocument;
$dom->loadXml($sourceHTML);
$replacement = $dom->createDocumentFragment();
$replacement->appendXML($replacementHTML);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[contains(#class,"arrangement--index__field-dato")]')->item(0);
$oldNode->removeChild($oldNode->firstChild);
$oldNode->appendChild($replacement);

replace html using DOMDocument in PHP

I'm trying to cleanup some bad html using DOMDocument. The html has an <div class="article"> element, with <br/><br/> instead of </p><p> -- I want to regex these into paragraphs...but can't seem to get my node back into the original document:
//load entire doc
$doc = new DOMDocument();
$doc->loadHTML($htm);
$xpath = new DOMXpath($doc);
//get the article
$article = $xpath->query("//div[#class='article']")->parentNode;
//get as string
$article_htm = $doc->saveXML($article);
//regex the bad markup
$article_htm2 = preg_replace('/<br\/><br\/>/i', '</p><p>', $article_htm);
//create new doc w/ new html string
$doc2 = new DOMDocument();
$doc2->loadHTML($article_htm2);
$xpath2 = new DOMXpath($doc2);
//get the original article node
$article_old = $xpath->query("//div[#class='article']");
//get the new article node
$article_new = $xpath2->query("//div[#class='article']");
//replace original node with new node
$article->replaceChild($article_old, $article_new);
$article_htm_new = $doc->saveXML();
//dump string
var_dump($article_htm_new);
all i get is a 500 internal server error...not sure what I'm doing wrong.
There are several issues:
$xpath->query returns a nodeList, not a node. You must select an item from the nodeList
replaceChild() expects as 1st argument the new node, and as 2nd the node to replace
$article_new is part of another document, you first must import the node into $doc
Fixed code:
//load entire doc
$doc = new DOMDocument();
$doc->loadHTML($htm);
$xpath = new DOMXpath($doc);
//get the article
$article = $xpath->query("//div[#class='article']")->item(0)->parentNode;
//get as string
$article_htm = $doc->saveXML($article);
//regex the bad markup
$article_htm2 = preg_replace('/<br\/><br\/>/i', '</p>xxx<p>', $article_htm);
//create new doc w/ new html string
$doc2 = new DOMDocument();
$doc2->loadHTML($article_htm2);
$xpath2 = new DOMXpath($doc2);
//get the original article node
$article_old = $xpath->query("//div[#class='article']")->item(0);
//get the new article node
$article_new = $xpath2->query("//div[#class='article']")->item(0);
//import the new node into $doc
$article_new=$doc->importNode($article_new,true);
//replace original node with new node
$article->replaceChild($article_new, $article_old);
$article_htm_new = $doc->saveHTML();
//dump string
var_dump($article_htm_new);
Instead of using 2 documents you may create a DocumentFragment of $article_htm2 and use this fragment as replacement.
I think it should be
$article->parentNode->replaceChild($article_old, $article_new);
the article is not a child of itself.

Categories