How to remove an HTML element using the DOMDocument class - php

Is there a way to remove a HTML element by using the DOMDocument class?

In addition to Dave Morgan's answer you can use DOMNode::removeChild to remove child from list of children:
Removing a child by tag name
//The following example will delete the table element of an HTML content.
$dom = new DOMDocument();
//avoid the whitespace after removing the node
$dom->preserveWhiteSpace = false;
//parse html dom elements
$dom->loadHTML($html_contents);
//get the table from dom
if($table = $dom->getElementsByTagName('table')->item(0)) {
//remove the node by telling the parent node to remove the child
$table->parentNode->removeChild($table);
//save the new document
echo $dom->saveHTML();
}
Removing a child by class name
//same beginning
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html_contents);
//use DomXPath to find the table element with your class name
$xpath = new DomXPath($dom);
$classname='MyTableName';
$xpath_results = $xpath->query("//table[contains(#class, '$classname')]");
//get the first table from XPath results
if($table = $xpath_results->item(0)){
//remove the node the same way
$table ->parentNode->removeChild($table);
echo $dom->saveHTML();
}
Resources
http://us2.php.net/manual/en/domnode.removechild.php
How to delete element with DOMDocument?
How to get full HTML from DOMXPath::query() method?

http://us2.php.net/manual/en/domnode.removechild.php
DomDocument is a DomNode.. You can just call remove child and you should be fine.
EDIT: Just noticed you were probably talking about the page you are working with currently. Don't know if DomDocument would work. You may wanna look to use javascript at that point (if its already been served up to the client)

Related

Php DOMDocument set element value by name

I've got an element by name attribute in Php using a DOMDocument (I do not want to use id) but how can I then set it's textContent and save to the dom object?
So far I have the following code:
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$element_name = $element->getAttribute("name");
if($element_name == 'mytextareaname') {
$element->textContent = "Some text content";
}
}
$html_with_values = $dom->saveHTML();
But the values are not saved, because I probably need to reference the $dom object when saving rather than $element. How can I do that, can I add a key to the foreach and use that?
Setting the dom element's textContent for a textarea did not work, but setting it's nodeValue set both nodeValue and textContent.

Strip out div with class and keep all other html within

My html content:
$content = <div class="class-name some-other-class">
<p>ack</p>
</div>
Goal: Remove div with class="class-name so that I'm left with:
<p>ack</p>
I know strip_tags($content, '<p>'); would do the job in this instance but I want to be able to target the divs with a certain class and preserve other divs etc.
And I'm aware that you shouldn't pass html through regex - So whats the best way/proper way to achieving this.
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($content); // loads your HTML
$xpath = new DOMXPath($doc);
// returns a list of all links with class containing class-name
$nlist = $xpath->query("div[contains(#class, 'class-name')]");
// Remove the nodes from the xpath query
foreach($nlist as $node) {
$node->parentNode->removeChild($node);
}
echo $doc->saveHtml();
Maybe with some jQuery? '$(".class-name").remove();'

Remove HTML Tag using DOMDocument

I'd like to remove <font> tags from my html and am trying to use replaceChild to do so, but it doesn't seem to work properly. Can anyone catch what might be wrong?
$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');
foreach($font_tags as $font_tag) {
foreach($font_tag as $child) {
$child->replaceChild($child->nodeValue, $font_tag);
}
}
echo $dom->saveHTML();
From what I understand, $font_tags is a DOMNodeList, so I need to iterate through it twice in order to use the DOMNode::replaceChild function. I then want to replace the current value with just the content inside of the tags. However, when I output the $html nothing changes. Any ideas what could be wrong?
Here is a PHP Sandbox to test the code.
I'll put my remarks inline
$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');
/* You only need one loop, as it is iterating your collection
You would only need a second loop if each font tag had children of their own
*/
foreach($font_tags as $font_tag) {
/* replaceChild replaces children of the node being called
So, to replace the font tag, call the function on its parent
$prent will be that reference
*/
$prent = $font_tag->parentNode;
/* You can't insert arbitrary text, you have to create a textNode
That textNode must also be a member of your document
*/
$prent->replaceChild($dom->createTextNode($font_tag->nodeValue), $font_tag);
}
echo $dom->saveHTML();
Updated Sandbox: Hopefully I understood your requirements correctly
First, let us find out what wasn't working in your code.
foreach($font_tag as $child) wasn't even iterating once as $font_tag is a single 'font' tag element from font_tags array, and not an array itself.
$child->replaceChild($child->nodeValue, $font_tag); - A child node can't replace its parent ($font_tag), but the reverse is possible.
As replaceChild is a method of the parent node to replace its child.
For more details check the PHP: DOMNode::replaceChild documentation, or the point 2 below my code.
echo $html will output the $html string, but not the updated $dom object that we are modifying.
This would work -
$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');
foreach($font_tags as $font_tag)
{
$new_node = $dom->createTextNode($font_tag->nodeValue);
$font_tag->parentNode->replaceChild($new_node, $font_tag);
}
echo $dom->saveHTML();
I am creating a $new_node directly in the $dom, so the node is live in the DOMDocument and not any local variable.
To replace the child object $font_tag, we have to first traverse to the parent node using the parentNode method.
Finally, we are printing out the modified $dom using saveHTML method, which will convert the DOMDocument into a HTML String.
Remove a specific span tag from HTML while preserving/keeping the inside content using PHP and DOMDocument
<?php
$content = '<span style="font-family: helvetica; font-size: 12pt;"><div>asdf</div><span>TWO</span>Business owners are fearful of leading. They would rather follow the leader than embrace a bold move that challenges their confidence. </span>';
$dom = new DOMDocument();
// Use LIBXML for preventing output of doctype, <html>, and <body> tags
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//span[#style="font-family: helvetica; font-size: 12pt;"]') as $span) {
// Move all span tag content to its parent node just before it.
while ($span->hasChildNodes()) {
$child = $span->removeChild($span->firstChild);
$span->parentNode->insertBefore($child, $span);
}
// Remove the span tag.
$span->parentNode->removeChild($span);
}
// Get the final HTML with span tags stripped
$output = $dom->saveHTML();
print_r($output);

Getting an element from PHP DOM and changing its value

I'm using PHP/Zend to load html into a DOM, and then I get a specific div id that I want to modify.
$dom = new Zend_Dom_Query($html);
$element = $dom->query('div[id="someid"]');
How do I modify the text/content/html displayed inside that $element div, and then save the changes to the $dom or $html so I can print the modified html. Any idea how to do this?
Zend_Dom_Query is tailored just for querying a dom, so it doesn't provide an interface in and of itself to alter the dom and save it, but it does expose the PHP Native DOM objects that will let you do so. Something like this should work:
$dom = new Zend_Dom_Query($html);
$document = $dom->getDocument();
$elements = $dom->query('div[id="someid"]');
foreach($elements AS $element) {
//$element is an instance of DOMElement (http://www.php.net/DOMElement)
//You have to create new nodes off the document
$node = $document->createElement("div", "contents of div");
$element->appendChild($node)
}
$newHtml = $document->saveXml();
Take a look at the PHP Doc for DOMElement to get an idea of how you can alter the dom:
http://www.php.net/DOMElement

PHP DOMDocument : how to select all links under a specific tag

I'm just getting started with using php DOMDocument and am having a little trouble.
How would I select all link nodes under a specific node lets say
in jquery i could simply do.. $('h5 > a')
and this would give me all the links under h5.
how would i do this in php using DOMDocument methods?
I tried using phpquery but for some reason it can't read the html page i'm trying to parse.
As far as I know, jQuery rewrites the selector queries to XPath. Any node jQuery can select, XPath also can.
h5 > a means select any a node for which the direct parent node is h5. This can easily be translated to a XPath query: //h5/a.
So, using DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//h5/a');
foreach ($nodes as $node) {
// do stuff
}
Retrieve the DOMElement whose children you are interested in and call DOMElement::getElementsByTagName on it.
Get all h5 tags from it, and loop through each one, checking if it's parent is an a tag.
// ...
$h5s = $document->getElementsByTagName('h5');
$correct_tags = array();
foreach ($h5s as $h5) {
if ($h5->parentNode->tagName == 'a') {
$correct_tags[] = $h5;
}
}
// do something with $correct_tags

Categories