removeNode doesn't work correctly

removeNode doesn't work correctly - php

I want to remove all the anchor tags that starts with '/'. this is my code:
$html = <<<HTML
<ul>
<li><a href="/foo/bar1">link1</li>
<li><a href="/foo/bar2">link2</li>
<li><a href="/foo/bar3">link3</li>
</ul>
HTML;
$dom = new DOMDocument;
#$dom->loadHTML($html);
$tags = $dom->getElementsByTagName('a');
echo 'removed nodes:<br />';
foreach ($tags as $tag)
{
$href = $tag->getAttribute('href');
if($href[0] == '/')
{
echo $tag->nodeValue.'<br />';
$tag->parentNode->removeChild($tag);
}
}
echo 'remined content:<br />';
echo $dom->saveXML($dom);
but the problem is it reminds some of them.
removed nodes:<br>
link1<br>
link3<br>
remined content:<br>
<ul><li>
</li><li>link2</li>
<li>
</li></ul>
any idea on how to do that?
thanks.

You can't remove DOMNodes from a DOMNodeList as you're iterating over them in a foreach loop (http://php.net/manual/en/domnode.removechild.php#90292). Though, making a queue of items to remove seems to work:
<?php
$html = <<<HTML
<ul>
<li>link1</li>
<li>link2</li>
<li>link3</li>
</ul>
HTML;
$dom = new DOMDocument;
#$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName('a');
$domElemsToRemove = array();
foreach ($domNodeList as $domElement ) {
$domElemsToRemove[] = $domElement;
}
echo 'removed nodes:<br />';
foreach ($domElemsToRemove as $tag)
{
$href = $tag->getAttribute('href');
if($href[0] == '/')
{
echo $tag->nodeValue.'<br />';
$tag->parentNode->removeChild($tag);
}
}
echo 'remined content:<br />';
echo $dom->saveXML($dom);
EDIT
also you forgot close tag <a>

Related

Using Xpath to return multiple elements value

I have a result from a curl request from a page like this:
$result =
<div class="c-wrapper">
<a href="link-to-a-page.php">
<div class="c-content-img">
<img src="...">
</div>
<div class="c-link-data">
<div class="c-link-data-title">
<h4>TITLE</h4>
</div>
</div>
</a>
<div>
<div class="c-wrapper">
<div class="c-content-img">
<img src="...">
</div>
<div class="c-link-data">
<div class="c-link-data-title">
<h4>TITLE 2</h4>
</div>
</div>
<div>
Now I have to count how many c-wrapper is present:
I use correctly this:
$doc = new DOMDocument();
#$doc->loadHTML($result);
$xpath = new DOMXPath($doc);
$divs = $xpath->query("//div[contains(#class, 'c-wrapper')]");
echo $divs-length; //<--- printed: 2
Then I have to print all titles:
I use correctly this:
$titles = $xpath->query("//div[contains(#class, 'c-link-data-title')]/h4");
foreach ($titles as $title) {
echo $title->textContent . "<br>";
}
Now the part I don't know: In the first div is present a link, in the second one no link. I'd like to edit my print of titles like this:
foreach ($titles as $title) {
if ( $link_extracted !="" )
echo "<a href='" . $link_extracted . "'>" . $title->textContent . "</a><br>";
else
echo $title->textContent . "<br>";
}
How can I edit $titles = $xpath->query("//div[contains(#class, 'c-link-data-title')]/h4"); to achieve this?

Rather than doing this in separate stages, the code finds the c-wrapper elements and then further uses XPath to find the various parts you want inside that particular element, so in
$link_extracted = $xpath->evaluate("a/#href", $div)[0];
it is looking for an <a> element relative to the $div element. Using [0] as you want only the first one.
$doc = new DOMDocument();
#$doc->loadHTML($result);
$xpath = new DOMXPath($doc);
$divs = $xpath->query("//div[contains(#class, 'c-wrapper')]");
echo $divs->length;
foreach ( $divs as $div ) {
$link_extracted = $xpath->evaluate("a/#href", $div)[0];
$title = $xpath->evaluate("descendant::div[contains(#class, 'c-link-data-title')]/h4/text()"
, $div)[0];
if ( !empty($link_extracted->nodeValue) ) {
echo "<a href='" . $link_extracted->nodeValue . "'>" . $title->textContent . "</a><br>";
}
else {
echo $title->textContent . "<br>";
}
}
which for your test HTML gives...
2<a href='link-to-a-page.php'>TITLE</a><br>TITLE 2<br>

How to get url in <a> tag with Simple HTML DOM Parser?

<ul id="PrList" class="v2">
<li class="tools">
</li>
<li class="firstRow">
<div class="i">
<a href="www.google.com" title="Google" class="nC">
<img src="something">
</a>
</div>
</li>
</ul>
How to get just href attribute in <div class="i">?
I tried this-
$html = file_get_html($link);
$urls = [];
foreach($html->find('.i') as $element) {
$url = $element->find('.nC')->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
but I received an error:-
Notice: Trying to get property of non-object
and I tried:-
$html = file_get_html($link);
$html = $html->find('div.i');
$html -> find('a',0)->href;
$echo $html;
but I received an error again:-
Fatal error: Call to a member function find() on array

You need to do it like below:-
$html = file_get_html($link);
$urls = [];
foreach($html->find('.i a') as $element) {
$url = $element->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
echo "<pre/>";print_r($urls);

Try this loop.
foreach($html->find('div[class=i] a') as $a){
var_dump($a->attr);
}

Unable pull out the node value of src using getattribute

I am trying to echo out the href and the image src using getattribute but though the href gets echoed correctly I am unable to retrieve the image src...plz guide. below is my
html mockup
<div id="hot-deals">
<div class="all-deals">
<ul>
<li><a href="http://url1.com">
<img src="http://imagelink1.com"></a>
</li>
<li><a href="http://url2.com">
<img src="http://imagelink2.com"></a>
</li>
<li><a href="http://url3.com">
<img src="http://imagelink3.com"></a>
</li>
</ul>
</div>
</div>
my code
$nodes = $my_xpath->query( '//div[#id="hot-deals"]/div[#class="all-deals"]/ul/li/a' );
foreach( $nodes as $node )
{
$title=$node->getAttribute('href');
$img=$node->getAttribute('img/src');
echo $title.",".$img."<br>";
}

src is not attribute of a tag, so you need one more step to get inner img tag and then take its attribute
foreach( $nodes as $node ) {
$title = $node->getAttribute('href');
$imgTags = $node->getElementsByTagName('img');
$img = $imgTags->item(0)->getAttribute('src');
echo $title . "," . $img . "<br>";
}

You can try this code.
<?php
$str = '<div id="hot-deals">
<div class="all-deals">
<ul>
<li><a href="http://url1.com">
<img src="http://imagelink1.com"></a>
</li>
<li><a href="http://url2.com">
<img src="http://imagelink2.com"></a>
</li>
<li><a href="http://url3.com">
<img src="http://imagelink3.com"></a>
</li>
</ul>
</div>
</div>';
$nodes = simplexml_import_dom(DOMDocument::loadHTML($str))->xpath('//div[#id="hot-deals"]/div[#class="all-deals"]/ul/li/a');
foreach( $nodes as $node )
{
$title = $node['href'];
$src = $node->img['src'];
echo $title ." " . $src . '<br>';
}

PHP searching with XPath

I have the following content:
<div class="item">
<a href="ONE">
<img src="TWO">
</a>
</div>
I want to use XPath to pull out "ONE" and "TWO" from there.
The code I have right now is:
$html = file_get_contents($_POST['url']);
$document = new DOMDocument();
$document->loadHTML ($html);
$selector = new DOMXPath($document);
$query = '//div[#class="item"]';
$anchors = $selector->query($query);
foreach ($anchors as $node) {
// print ONE;
// print TWO;
}

Here comes an example:
$html = <<<EOF
<div class="item">
<a href="ONE">
<img src="TWO">
</a>
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
$links = $selector->query(
'//div[#class="item"]//#href | //div[#class="item"]//#src'
);
foreach($links as $link) {
echo $link->nodeValue . PHP_EOL;
}
If you want to break it down by <div class="item"> you can use the following code:
foreach($selector->query('//div[#class="item"]') as $div) {
foreach($selector->query('.//#href | .//#src', $div) as $link) {
echo $link->nodeValue . PHP_EOL;
}
}

Remove all list elements except the first in PHP?

How do I remove all the li elements except the first in PHP?
<div class="category">
<ul class="products">
<li>{nested child elements}</li>
<li>{nested child elements}</li>
<li>{nested child elements}</li>
</ul>
</div>
The code above is generated by another script via a function.
The result should be like this:
<div class="category">
<ul class="products">
<li>{nested child elements}</li>
</ul>
</div>
UPDATE: Sorry guys the "category" is a class not an ID.
In repay to Yoshi, ul.products has siblings but I didn't include them in my post. Would that affect the query?
This is how my code looks like with Yoshi's code added:
class Myclass {
function prin_html() {
$content = get_code();
$dom = new DOMDocument;
$dom->loadXml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//li[position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
}
}
It sill prints the non-filtered html code...

Try:
$dom = new DOMDocument;
$dom->loadHtml('<div id="category">
<ul class="products">
<li>{nested child elements}</li>
<li>{nested child elements}</li>
<li>{nested child elements}</li>
</ul>
</div>');
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//li[position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
$output = '';
foreach ($xpath->query('//body/*') as $child) {
$output .= $dom->saveXml($child);
}
Output:
<div id="category">
<ul class="products">
<li>{nested child elements}</li>
</ul>
</div>

You could use DOMDocument.
$dom = new DOMDocument;
$dom->loadHTML($html);
$ul = $dom->getElementById('category')->getElementsByTagName('ul')->item(0);
foreach($ul->getElementsByTagName('li') as $index => $li) {
if ($index == 0) {
continue;
}
$ul->removeChild($li);
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

removeNode doesn't work correctly - php

Related

Using Xpath to return multiple elements value

How to get url in <a> tag with Simple HTML DOM Parser?

Unable pull out the node value of src using getattribute

PHP searching with XPath

Remove all list elements except the first in PHP?

Categories

Resources