replace html using DOMDocument in PHP - php

I'm trying to cleanup some bad html using DOMDocument. The html has an <div class="article"> element, with <br/><br/> instead of </p><p> -- I want to regex these into paragraphs...but can't seem to get my node back into the original document:
//load entire doc
$doc = new DOMDocument();
$doc->loadHTML($htm);
$xpath = new DOMXpath($doc);
//get the article
$article = $xpath->query("//div[#class='article']")->parentNode;
//get as string
$article_htm = $doc->saveXML($article);
//regex the bad markup
$article_htm2 = preg_replace('/<br\/><br\/>/i', '</p><p>', $article_htm);
//create new doc w/ new html string
$doc2 = new DOMDocument();
$doc2->loadHTML($article_htm2);
$xpath2 = new DOMXpath($doc2);
//get the original article node
$article_old = $xpath->query("//div[#class='article']");
//get the new article node
$article_new = $xpath2->query("//div[#class='article']");
//replace original node with new node
$article->replaceChild($article_old, $article_new);
$article_htm_new = $doc->saveXML();
//dump string
var_dump($article_htm_new);
all i get is a 500 internal server error...not sure what I'm doing wrong.

There are several issues:
$xpath->query returns a nodeList, not a node. You must select an item from the nodeList
replaceChild() expects as 1st argument the new node, and as 2nd the node to replace
$article_new is part of another document, you first must import the node into $doc
Fixed code:
//load entire doc
$doc = new DOMDocument();
$doc->loadHTML($htm);
$xpath = new DOMXpath($doc);
//get the article
$article = $xpath->query("//div[#class='article']")->item(0)->parentNode;
//get as string
$article_htm = $doc->saveXML($article);
//regex the bad markup
$article_htm2 = preg_replace('/<br\/><br\/>/i', '</p>xxx<p>', $article_htm);
//create new doc w/ new html string
$doc2 = new DOMDocument();
$doc2->loadHTML($article_htm2);
$xpath2 = new DOMXpath($doc2);
//get the original article node
$article_old = $xpath->query("//div[#class='article']")->item(0);
//get the new article node
$article_new = $xpath2->query("//div[#class='article']")->item(0);
//import the new node into $doc
$article_new=$doc->importNode($article_new,true);
//replace original node with new node
$article->replaceChild($article_new, $article_old);
$article_htm_new = $doc->saveHTML();
//dump string
var_dump($article_htm_new);
Instead of using 2 documents you may create a DocumentFragment of $article_htm2 and use this fragment as replacement.

I think it should be
$article->parentNode->replaceChild($article_old, $article_new);
the article is not a child of itself.

Related

print_r for nodeList is not working

I have the following source code:
<?php
function getTerms()
{
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML('https://charitablebookings.com/terms'); // loads your HTML
$xpath = new DOMXPath($doc);
// returns a list of all links with rel=nofollow
$nodeList = $xpath->query("//div[#class='terms-conditions']");
$temp_dom = new DOMDocument();
$node = $nodeList->item(0);
$temp_dom = new DOMDocument();
foreach($nodeList as $n) $temp_dom->appendChild($temp_dom->importNode($n,true));
print_r($temp_dom->saveHTML());
}
getTerms();
?>
which I'm trying to get a text from a web page by getting a specific class. I don't get anything on my browser when I try to print_r the temp_dom. And $node is null. What am I doing wrong ?
Thanks for your time
The first issue is that DOMDocument's loadHTML method expects HTML content as its first parameter, not an URL.
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html = file_get_contents('https://charitablebookings.com/terms');
$doc->loadHTML($html);
And the second problem is with your XPath expression: $xpath->query("//div[#class='terms-conditions']") - as there is no div with class of terms-conditions in the document (it probably gets added by some JavaScript loader).

PHP: Appending (adding) html content to exsisting element by ID

I need to search for an element by ID using PHP then appending html content to it. It seems simple enough but I'm new to php and can't find the right function to use to do this.
$html = file_get_contents('http://example.com');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$descBox = $doc->getElementById('element1');
I just don't know how to do the next step. Any help would be appreciated.
Like chris mentioned in his comment try using DOMNode::appendChild, which will allow you to add a child element to your selected element and DOMDocument::createElement to actually create the element like so:
$html = file_get_contents('http://example.com');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('element1');
//create the element to append to #element1
$appended = $doc->createElement('div', 'This is a test element.');
//actually append the element
$descBox->appendChild($appended);
Alternatively if you already have an HTML string you want to append you can create a document fragment like so:
$html = file_get_contents('http://example.com');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('element1');
//create the fragment
$fragment = $doc->createDocumentFragment();
//add content to fragment
$fragment->appendXML('<div>This is a test element.</div>');
//actually append the element
$descBox->appendChild($fragment);
Please note that any elements added with JavaScript will be inaccessible to PHP.
you can also append this way
$html = '
<html>
<body>
<ul id="one">
<li>hello</li>
<li>hello2</li>
<li>hello3</li>
<li>hello4</li>
</ul>
</body>
</html>';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('one');
//create the element to append to #element1
$appended = $doc->createElement('li', 'This is a test element.');
//actually append the element
$descBox->appendChild($appended);
echo $doc->saveHTML();
dont forget to saveHTML on the last line

PHP DOMDocument - createDocumentFragment does not work with loadHTML

I have a string that contains HTML and I would like to insert this HTML in a DOMElement.
For that, I did:
$abstract = "<p xmlns:default="http://www.w3.org/1998/Math/MathML">Test string <formula type="inline"><default:math xmlns="http://www.w3.org/1998/Math/MathML"><default:mi>π</default:mi></default:math></formula></p>"
$dom = new \DOMDocument();
#$dom->loadHTML($abstract);
$frag = $dom->createDocumentFragment();
When var dumping the $frag->nodeValue, I am getting null. Any idea?
I am not sure what you expect, you creating a new fragment and you add no content. Even if you do it would not work because the document fragment is no node, it is an helper construct to add a XML fragment to a document.
Here is an example:
$dom = new \DOMDocument();
$body = $dom->appendChild($dom->createElement('body'));
$fragment = $dom->createDocumentFragment();
$fragment->appendXml('<p>first</p>second');
$body->appendChild($fragment);
echo $dom->saveHtml();
Output:
<body><p>first</p>second</body>

getting the source code of remote page then display only one div based on its id

exactly as its descriped in the title currently my code is:
<?php
$url = "remotesite.com/page1.html";
$html = file_get_contents($url);
$doc = new DOMDocument(); // create DOMDocument
libxml_use_internal_errors(true);
$doc->loadHTML($html); // load HTML you can add $html
$elements = $doc->getElementsByTagName('div');
?>
my coding skills are very basic so at this point i am lost and dont know how to display only the div that has the id id=mydiv
If you have PHP 5.3.6 or higher you can do the following:
$url = "remotesite.com/page1.html";
$html = file_get_contents($url);
$doc = new DOMDocument(); // create DOMDocument
libxml_use_internal_errors(true);
$doc->loadHTML($html); // load HTML you can add $html
$testElement = $doc->getElementById('divIDName');
echo $doc->saveHTML($testElement);
http://php.net/manual/en/domdocument.getelementbyid.php
If you have a lower version I believe you would need to copy the Dom node once you found it with getElementById into a new DomDocument object.
$elementDoc = new DOMDocument();
$cloned = $testElement->cloneNode(TRUE);
$elementDoc->appendChild($elementDoc->importNode($cloned,TRUE));
echo $elementDoc->saveHTML();

Updating existing element in XML with PHP

Currently, im having this for appending data to my items file:
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$item= $xmldoc->createElement('item');
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
$xmldoc->save('ex.xml');
Now before appending this, I would like to check for an existing element "item" that has the same attribute id value.
And if it does it should update that element with these new data.
Currently it just appends and doesnt check anything.
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$xpath = new DOMXPath($xmldoc);
$query = $xpath->query('/mainXML/items/item[#id = "100"]');
$create_new_node = false;
if($query->length == 0)
{
$item = $xmldoc->createElement('item');
$create_new_node = true;
}
else
{
$item = $query->item(0);
}
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
if($create_new_node)
{
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
}
$xmldoc->save('ex.xml');
I haven't used this functionality but looks like a good match for DOMDocument: Get Element By ID
If you get a matching element, edit it, and if not, post away.
If you have a DTD for this xml file that specifies that the "id" attribute is an ID type (i.e. its value is unique in a document and uniquely identifies its element), then you can use DOMDocument::getElementById().
Most likely, however, you do not have a DTD. In this case, you should just use XPath:
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$xpath = new DOMXPath($xmldoc);
$results = $xpath->query('//items/item[#id=100][0]');
if (!$results->length) {
$item= $xmldoc->createElement('item');
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
$xmldoc->save('ex.xml');
}
You should also consider using SimpleXML for this task. The way this xml is structured and manipulated would probably be better-suited to SimpleXML.

Categories