PHP: Can't remove node from DOMDocument

PHP: Can't remove node from DOMDocument - php

I Can't remove node from DOMDocument(get Exception):
My Code:
<?php
function filterElements($htmlString) {
$doc = new DOMDocument();
$doc->loadHTML($htmlString);
$nodes = $doc->getElementsByTagName('a');
for ($i = 0; $i < $nodes->length; $i++) {
$node=$nodes->item($i)
if ($value->nodeValue == 'my_link') {
$doc->removeChild($node);
}
}
}
$htmlString = '<div>begin..</div>this tool<a name="my_link">Beo</a> great!<div>.end</div>';
filterKeyLinksElements($htmlString);
?>
Thanks,
Yosef

First off, what exception are you getting (It likely matters).
As for the specific problem, my guess would be as follows::
The $node is not a child of the document. It's a child of its parent. So you'd need to do:
$node->parentNode->removeChild($node);

Related

PHP: Remove a hyperlink from element but retain the text and class

I need to process a DOM and remove all hyperlinks to a particular site while retaining the underlying text. Thus, something ling text changes into text. Taking cue from this thread, I wrote this:
$as = $dom->getElementsByTagName('a');
for ($i = 0; $i < $as->length; $i++) {
$node = $as->item($i);
$link_href = $node->getAttribute('href');
if (strpos($link_href,'offendinglink.com') !== false) {
$cl = $node->getAttribute('class');
$text = new DomText($node->nodeValue);
$node->parentNode->insertBefore($text, $node);
$node->parentNode->removeChild($node);
$i--;
}
}
This works fine except that I also need to retain the class attributed to the offending <a> tag and maybe turn it into a <div> or a <span>. Thus, I need this:
text
to turn into this:
<div class="nice">text</div>
How do I access the new element after it's been added (like in my code snippet)?

quote "How do I access the new element after it's been added (like in my code snippet)?" - your element is in $text i think.. anyway, i think this should work, if you need to save the class and the textContent, but nothing else
foreach($dom->getElementsByTagName('a') as $url){
if(parse_url($url->getAttribute("href"),PHP_URL_HOST)!=='badsite.com') {
continue;
}
$ele = $dom->createElement("div");
$ele->textContent = $url->textContent;
$ele->setAttribute("class",$url->getAttribute("class"));
$url->parentNode->insertBefore($ele,$url);
$url->parentNode->removeChild($url);
}

Tested solution:
<?php
$str = "<b>Dummy</b> <a href='http://google.com' target='_blank' class='nice' id='nicer'>Google.com</a> <a href='http://yandex.ru' target='_blank' class='nice' id='nicer'>Yandex.ru</a>";
$doc = new DOMDocument();
$doc->loadHTML($str);
$anchors = $doc->getElementsByTagName('a');
$l = $anchors->length;
for ($i = 0; $i < $l; $i++) {
$anchor = $anchors->item(0);
$link = $doc->createElement('div', $anchor->nodeValue);
$link->setAttribute('class', $anchor->getAttribute('class'));
$anchor->parentNode->replaceChild($link, $anchor);
}
echo preg_replace(['/^\<\!DOCTYPE.*?<html><body>/si', '!</body></html>$!si'], '', $doc->saveHTML());
Or see runnable.

How to make crawling and extracting data in each pager links?

I want to extract all the attributes name="" of a website,
example html
<div class="link_row">
link
</div>
I have the following code:
<?php
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.onedomain.com/plus?ca=11_c&o=1');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='link_row']/a[#class='listing_container']/#name" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n<br>";
}
?>
Result is:
7777
This code is working fine, but need not be limited to one pager number.
http://www.onedomain.com/plus?ca=11_c&o=1 pager attr is "o=1"
I would like once you finish with o=1, follow with o=2
to my variable defined $last=556 is equal http://www.onedomain.com/plus?ca=11_c&o=556
Could you help me?
What is the best way to do it?
Thanks

Use a for (or while) loop. I don't see $last in your provided code so I've statically set the max value plus one.
$html = new DOMDocument();
for($i =1; $i < 557; $i++) {
#$html->loadHtmlFile('http://www.onedomain.com/plus?ca=11_c&o=' . $i);
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='link_row']/a[#class='listing_container']/#name" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n<br>";
}
}
Simpler example:
for($i =1; $i < 557; $i++) {
echo $i;
}
http://php.net/manual/en/control-structures.for.php

PHP XPath - query finds too many nodes

I'm trying to multiplicate a row (with data-id='first') from a template three times and fill the proper field ({first}) with some value (0,1,2 in this case). Below you can find my simple code. I don't understand, why this line - $nodeList = $xpath->query("//*[text()[contains(.,'first')]]", $newNode); finds more than one node (it finds nodes which contain text 'first'). It just finds both rows - the cloned and the original one, so it replaces the text in both of them, while it should replace it only in the new one - please note that I'm providing the second parameter for function $xpath->query which should make the search relative to just that new node I just cloned.
Here's a fiddle: https://eval.in/170941
HTML:
<html>
<head>
<title>test</title>
</head>
<body>
<table>
<tr data-id="first">
<td>{first}</td>
</tr>
</table>
</body>
</html>
PHP:
<?php
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$element = $xpath->query("//*[#data-id='first']")->item(0);
$element->removeAttribute("data-id");
$parent = $element->parentNode;
for ($i = 0; $i < 3; $i++) {
$newNode = $element->cloneNode(true);
$parent->insertBefore($newNode, $element);
$nodeList = $xpath->query("//*[text()[contains(.,'first')]]", $newNode);
for($j = 0; $j < $nodeList->length; $j++) {
$n = $nodeList->item($j);
$n->nodeValue = preg_replace("{{first}}", $i, $n->nodeValue);
}
}
$parent->removeChild($element);
echo $dom->saveHTML();
As you can see, the result is a three elements table with rows valued 0,0,0, while expected values should be 0,1,2.

Starting an xpath location path with / means tha it start at the document root. So //* is always any element node, the context argument has no effect.
Try:
$nodeList = $xpath->query(".//*[text()[contains(.,'first')]]", $newNode);
HINT: DOMXpath::query() does only allow expressions that return a node list, DOMXpath::evaluate() allows all expressions. Example: count(//*).
HINT: DOMNodelist objects implement iterator, you can use foreach to iterate them.

The problem you are having is that you are cloning the original node, but in your first pass you're altering the original node's content. Every pass after that is copying the already modified node, so there is no {first} to find.
One solution is to make a clone of the source element which you never insert into the document, and use that inside your loop.
Here's my fiddle: https://eval.in/171149
<?php
$html = '<html><head><title>test</title></head><body><table><tr data-id="first"><td>{first}</td></tr></table></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$element = $xpath->query("//*[#data-id='first']")->item(0);
$element->removeAttribute("data-id");
$parent = $element->parentNode;
$clonedNode = $element->cloneNode(true);
for ($i = 0; $i < 3; $i++) {
$newNode = $clonedNode->cloneNode(true);
$parent->insertBefore($newNode, $element);
$nodeList = $xpath->query("//*[text()[contains(.,'first')]]", $newNode);
for($j = 0; $j < $nodeList->length; $j++) {
$n = $nodeList->item($j);
$n->nodeValue = preg_replace("{{first}}", $i, $n->nodeValue);
}
}
$parent->removeChild($element);
echo $dom->saveHTML();

Using DOMXpath how do I loop through the DOM and stop at the first piece of text?

I was to use DOMXpath to loop through the nodes of a DOM and stops when it gets to the first piece of text.
So with this method I can capture and delete the first lot of line breaks but leave the rest after hello world:
$html = '<br><br><br>Hello World<br><br><br>'
I'm not sure what the $xpath query is to find plain text but I imaging the code would be something like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);
$i = 1;
$dom_xpath = new DOMXpath($doc);
foreach($nodes as $node) {
do {
$node->parentNode->removeChild($node);
} while ($i > 0);
if($node == $xpath->query("/:TEXT")){
$i = 0;
}
}
Just a rough piece of code but imagine what I want is something like that, could somebody fill in the gaps for me please.

$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach($xpath->query('//br[not(preceding::text())]') as $node) {
$node->parentNode->removeChild($node);
}
return $doc->saveHTML();
#cHao the man!

PHP XPath to change stylesheet

I use xpath to change stylesheet of href of stylesheet <link> in header.
But it doesn't work at all.
$html=file_get_contents('http://stackoverflow.com');
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$css_links = $xpath->evaluate("//link[#type='text/css']");
for ($i = 0; $i < $css_links->length; $i++)
{
$csslink = $css_links->item($i);
$oldurl = $csslink->getAttribute('href');
$newURL='http://example.com/aaaa.css';
$csslink->removeAttribute('href');
$csslink->setAttribute('href', $newURL);
}
echo $html;

You're using #$doc->loadHTML(html); instead of #$doc->loadHTML($html); (note the $), otherwise it works.
Also use echo $doc->SaveHtml() instead of echoing $html.
Working example here.
You also can replace for($i...) with foreach because DOMNodeList implements Traversable:
foreach ($css_links as $csslink)
{
$oldurl = $csslink->getAttribute('href');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP: Can't remove node from DOMDocument - php

First off, what exception are you getting (It likely matters). As for the specific problem, my guess would be as follows:: The $node is not a child of the document. It's a child of its parent. So you'd need to do: $node->parentNode->removeChild($node);

Related

PHP: Remove a hyperlink from element but retain the text and class

How to make crawling and extracting data in each pager links?

PHP XPath - query finds too many nodes

Using DOMXpath how do I loop through the DOM and stop at the first piece of text?

PHP XPath to change stylesheet

Categories

Resources