Remove specific class domdocument - php

I have this in dom document:
<div><span class="hello"></span><div>
And I want to remove all spans with that class. So I tried:
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//span[#class="hello"]/..');
foreach ($elements as $el) {
$el->parentNode->removeChild($el);
}
But this removes the parent element as well (the div element). How can I only remove the span elements?

The /.. at the end of your XPath selector is selecting the parent element, not the <span> itself - .. means to work one level back up the tree, the same as in a directory path. So in your loop, where you say parentNode->removeChild, you're actually removing the div, since $el is already the span's parent element.
If you just remove the /.. from the end of the selector, the code should work as intended.
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//span[#class="hello"]');
foreach ($elements as $el) {
$el->parentNode->removeChild($el);
}
Full example: https://3v4l.org/o4dRv

Related

Removing the child node of an XML file using DOM and PHP

I'm trying to delete a child node within a XML document using DOM and PHP but I can't quite figure out how to do it. I do not have access to simpleXML.
XML Layout:
<list>
<as>
<a>
<a1>delete</a1>
</a>
<a>
<a1>keep</a1>
</a>
</as>
<list>
PHP Code:
$xml = "file.xml";
$dom = DOMDocument::load($xml);
$list = $dom->getElementsByTagName('as')->item(0);
//Cycle through <as> elements (there are multiple in the full file)
foreach($list->childNodes as $child) {
$subChild = substr($child->tagName, 0, -1);
$a = $dom->getElementsByTagName($subChild);
//Cycle through <a> elements
foreach($a as $node)
{
//Get status for status check
$check= $node->getElementsByTagName("a1")->item(0)->nodeValue;
if(strcmp($check,'delete')==0)
{
//code to delete here (I wish to delete the <a> that this triggers
}
}
}
http://www.php.net/manual/en/class.domnode.php
http://www.php.net/manual/en/domnode.removechild.php
You need the parent of a node to remove it, and you've got it as a property of the node that you want to remove, so no biggie. The result would be:
$node->parentNode->removeChild($node);

PHP DOM Get Links Inside DIV

I'm attempting to iterate thru DIV's and get all of the links from each DIV. I'd put this is an array, i.e.:
[Astronomy] // div #class=container
[link] http://www.nasa.gov
[link] http://www.seti.org
[Biology] // div #class=container
[link] http://www.biology.com
[Chemistry] // div #class=container
[link] http://www.chemistry.com
I can use DOM to get the text of the content inside the DIV's, but I can't figure out how to get the HREF Attribute of nodes inside the DIV. getAttribute isn't a method of Node. How can I iterate thru elements ('a') inside of an existing xpath?
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("*/div[#class='container']");
foreach($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
// ??? $links = $dom_xpath->query("//a");
}
}
You should try and use $element->getElementsByTagName('a') instead of using $element->childNodes.

how do i extract values from multiple divs with xpath

How do I make this code snippet return the values for every div with class age on the page I am parsing rather than just the first one as it does now?
$nodelist = $xpath->query('//div[#class="age"]')->item(0);
print_r($nodelist->nodeValue);
I have some similar code that returns all the images I want but I can't seem to modify it to return the matching div values I want:
$nodelist = $xpath->query( "//div[#class='thumb-wrapper']" );
foreach ($nodelist as $node)
{
$tags = $node->getElementsByTagName('img');
$image = $tags->item(0)->getAttribute('src');
echo '<img src="'. $image .'" alt="image" ><br>';
}
You need to use "*"
Using the star () selects every element that is within the preceding
path. So if you wanted to match every element that is within a td tag
(such as p, div, etc.), you would write: //td/
The problem with this code isn't the XPath its what you do with it once its returned.
$nodelist = $xpath->query('//div[#class="age"]')->item(0);
print_r($nodelist->nodeValue);
This gets all of the divs and then gets the first one using ->item(0) and then assigns that frst item to the variable $nodelist.
Using you existing code as an example you can alter it by removing the ->item(0), assign all the results to $nodelist and iterate through them just like the second 'working' example:
$nodelist = $xpath->query('//div[#class="age"]');
foreach ($nodelist as $node)
{
// Do something with each div
}

How to delete element with DOMDocument?

Is it possible to delete element from loaded DOM without creating a new one? For example something like this:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $href)
if($href->nodeValue == 'First')
//delete
You remove the node by telling the parent node to remove the child:
$href->parentNode->removeChild($href);
See DOMNode::$parentNodeDocs and DOMNode::removeChild()Docs.
See as well:
How to remove attributes using PHP DOMDocument?
How to remove an HTML element using the DOMDocument class
This took me a while to figure out, so here's some clarification:
If you're deleting elements from within a loop (as in the OP's example), you need to loop backwards
$elements = $completePage->getElementsByTagName('a');
for ($i = $elements->length; --$i >= 0; ) {
$href = $elements->item($i);
$href->parentNode->removeChild($href);
}
DOMNodeList documentation: You can modify, and even delete, nodes from a DOMNodeList if you iterate backwards
Easily:
$href->parentNode->removeChild($href);
I know this has already been answered but I wanted to add to it.
In case someone faces the same problem I have faced.
Looping through the domnode list and removing items directly can cause issues.
I just read this and based on that I created a method in my own code base which works:https://www.php.net/manual/en/domnode.removechild.php
Here is what I would do:
$links = $dom->getElementsByTagName('a');
$links_to_remove = [];
foreach($links as $link){
$links_to_remove[] = $link;
}
foreach($links_to_remove as $link){
$link->parentNode->removeChild($link);
}
$dom->saveHTML();
for remove tag or somthing.
removeChild($element->id());
full example:
$dom = new Dom;
$dom->loadFromUrl('URL');
$html = $dom->find('main')[0];
$html2 = $html->find('p')[0];
$span = $html2->find('span')[0];
$html2->removeChild($span->id());
echo $html2;

Regex match HTML tag NOT containing another tag

I am writing a regex find/replace that will insert a <span> into every <a href> in a file where a <span> does not already exist. It will allow other tags to be in the <a href> like <img>, <b>, etc.
Currently I have this regex:
Find: (<a[^>]+?style=".*?color:#(\w{6}).*?".*?>)(.+?)(<\/a>)
Replace: '$1<span style="color:#$2;">$3</span>$4'
It works great except if i run it over the same file, it will insert a <span> inside of a <span> and it gets messy.
Target Example:
We want it to ignore this:
<span style="color:#bfbcba;">Howdy</span>
But not this:
Howdy
Or this:
<img src="myimg.gif" />Howdy
--EDIT--
Using the PHP DOM library as suggested in the comments, this is what I have so far:
$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
$spancount = $tag->getElementsByTagName("span")->length;
if($spancount == 0){
$element = $doc->createElement('span');
$tag->appendChild($element);
}
}
echo $doc->saveHTML();`
Currently it will detect if there is a span inside an anchor and if there is, it will append a span to the inside of the anchor, however, i have yet to figure out how to get the original contents of the anchor inside the span.
Don't use regex for this, it's not ideal for HTML.
Use a DOM library and getElementsByTagName('a') then iterate through each anchor and see if it contains a sub span element with getElementsByTagName('span'), using the length property. If it doesn't, appendChild or assign the firstChild of the anchor node to your new span created with document.createElement('span').
EDIT: As for grabbing the inner html of the anchor, if there are lots of nodes inside, try using this:
<?php
function innerHTML($node){
$doc = new DOMDocument();
foreach ($node->childNodes as $child)
$doc->appendChild($doc->importNode($child, true));
return $doc->saveHTML();
}
$html = innerHTML( $anchorRef );
This may also help you out: Change innerHTML of a php DOMElement

Categories