removeNode doesn't work correctly - php

I want to remove all the anchor tags that starts with '/'. this is my code:
$html = <<<HTML
<ul>
<li><a href="/foo/bar1">link1</li>
<li><a href="/foo/bar2">link2</li>
<li><a href="/foo/bar3">link3</li>
</ul>
HTML;
$dom = new DOMDocument;
#$dom->loadHTML($html);
$tags = $dom->getElementsByTagName('a');
echo 'removed nodes:<br />';
foreach ($tags as $tag)
{
$href = $tag->getAttribute('href');
if($href[0] == '/')
{
echo $tag->nodeValue.'<br />';
$tag->parentNode->removeChild($tag);
}
}
echo 'remined content:<br />';
echo $dom->saveXML($dom);
but the problem is it reminds some of them.
removed nodes:<br>
link1<br>
link3<br>
remined content:<br>
<ul><li>
</li><li>link2</li>
<li>
</li></ul>
any idea on how to do that?
thanks.

You can't remove DOMNodes from a DOMNodeList as you're iterating over them in a foreach loop (http://php.net/manual/en/domnode.removechild.php#90292). Though, making a queue of items to remove seems to work:
<?php
$html = <<<HTML
<ul>
<li>link1</li>
<li>link2</li>
<li>link3</li>
</ul>
HTML;
$dom = new DOMDocument;
#$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName('a');
$domElemsToRemove = array();
foreach ($domNodeList as $domElement ) {
$domElemsToRemove[] = $domElement;
}
echo 'removed nodes:<br />';
foreach ($domElemsToRemove as $tag)
{
$href = $tag->getAttribute('href');
if($href[0] == '/')
{
echo $tag->nodeValue.'<br />';
$tag->parentNode->removeChild($tag);
}
}
echo 'remined content:<br />';
echo $dom->saveXML($dom);
EDIT
also you forgot close tag <a>

Related

Using Xpath to return multiple elements value

I have a result from a curl request from a page like this:
$result =
<div class="c-wrapper">
<a href="link-to-a-page.php">
<div class="c-content-img">
<img src="...">
</div>
<div class="c-link-data">
<div class="c-link-data-title">
<h4>TITLE</h4>
</div>
</div>
</a>
<div>
<div class="c-wrapper">
<div class="c-content-img">
<img src="...">
</div>
<div class="c-link-data">
<div class="c-link-data-title">
<h4>TITLE 2</h4>
</div>
</div>
<div>
Now I have to count how many c-wrapper is present:
I use correctly this:
$doc = new DOMDocument();
#$doc->loadHTML($result);
$xpath = new DOMXPath($doc);
$divs = $xpath->query("//div[contains(#class, 'c-wrapper')]");
echo $divs-length; //<--- printed: 2
Then I have to print all titles:
I use correctly this:
$titles = $xpath->query("//div[contains(#class, 'c-link-data-title')]/h4");
foreach ($titles as $title) {
echo $title->textContent . "<br>";
}
Now the part I don't know: In the first div is present a link, in the second one no link. I'd like to edit my print of titles like this:
foreach ($titles as $title) {
if ( $link_extracted !="" )
echo "<a href='" . $link_extracted . "'>" . $title->textContent . "</a><br>";
else
echo $title->textContent . "<br>";
}
How can I edit $titles = $xpath->query("//div[contains(#class, 'c-link-data-title')]/h4"); to achieve this?
Rather than doing this in separate stages, the code finds the c-wrapper elements and then further uses XPath to find the various parts you want inside that particular element, so in
$link_extracted = $xpath->evaluate("a/#href", $div)[0];
it is looking for an <a> element relative to the $div element. Using [0] as you want only the first one.
$doc = new DOMDocument();
#$doc->loadHTML($result);
$xpath = new DOMXPath($doc);
$divs = $xpath->query("//div[contains(#class, 'c-wrapper')]");
echo $divs->length;
foreach ( $divs as $div ) {
$link_extracted = $xpath->evaluate("a/#href", $div)[0];
$title = $xpath->evaluate("descendant::div[contains(#class, 'c-link-data-title')]/h4/text()"
, $div)[0];
if ( !empty($link_extracted->nodeValue) ) {
echo "<a href='" . $link_extracted->nodeValue . "'>" . $title->textContent . "</a><br>";
}
else {
echo $title->textContent . "<br>";
}
}
which for your test HTML gives...
2<a href='link-to-a-page.php'>TITLE</a><br>TITLE 2<br>

How to get url in <a> tag with Simple HTML DOM Parser?

<ul id="PrList" class="v2">
<li class="tools">
</li>
<li class="firstRow">
<div class="i">
<a href="www.google.com" title="Google" class="nC">
<img src="something">
</a>
</div>
</li>
</ul>
How to get just href attribute in <div class="i">?
I tried this-
$html = file_get_html($link);
$urls = [];
foreach($html->find('.i') as $element) {
$url = $element->find('.nC')->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
but I received an error:-
Notice: Trying to get property of non-object
and I tried:-
$html = file_get_html($link);
$html = $html->find('div.i');
$html -> find('a',0)->href;
$echo $html;
but I received an error again:-
Fatal error: Call to a member function find() on array
You need to do it like below:-
$html = file_get_html($link);
$urls = [];
foreach($html->find('.i a') as $element) {
$url = $element->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
echo "<pre/>";print_r($urls);
Try this loop.
foreach($html->find('div[class=i] a') as $a){
var_dump($a->attr);
}

Unable pull out the node value of src using getattribute

I am trying to echo out the href and the image src using getattribute but though the href gets echoed correctly I am unable to retrieve the image src...plz guide. below is my
html mockup
<div id="hot-deals">
<div class="all-deals">
<ul>
<li><a href="http://url1.com">
<img src="http://imagelink1.com"></a>
</li>
<li><a href="http://url2.com">
<img src="http://imagelink2.com"></a>
</li>
<li><a href="http://url3.com">
<img src="http://imagelink3.com"></a>
</li>
</ul>
</div>
</div>
my code
$nodes = $my_xpath->query( '//div[#id="hot-deals"]/div[#class="all-deals"]/ul/li/a' );
foreach( $nodes as $node )
{
$title=$node->getAttribute('href');
$img=$node->getAttribute('img/src');
echo $title.",".$img."<br>";
}
src is not attribute of a tag, so you need one more step to get inner img tag and then take its attribute
foreach( $nodes as $node ) {
$title = $node->getAttribute('href');
$imgTags = $node->getElementsByTagName('img');
$img = $imgTags->item(0)->getAttribute('src');
echo $title . "," . $img . "<br>";
}
You can try this code.
<?php
$str = '<div id="hot-deals">
<div class="all-deals">
<ul>
<li><a href="http://url1.com">
<img src="http://imagelink1.com"></a>
</li>
<li><a href="http://url2.com">
<img src="http://imagelink2.com"></a>
</li>
<li><a href="http://url3.com">
<img src="http://imagelink3.com"></a>
</li>
</ul>
</div>
</div>';
$nodes = simplexml_import_dom(DOMDocument::loadHTML($str))->xpath('//div[#id="hot-deals"]/div[#class="all-deals"]/ul/li/a');
foreach( $nodes as $node )
{
$title = $node['href'];
$src = $node->img['src'];
echo $title ." " . $src . '<br>';
}

PHP searching with XPath

I have the following content:
<div class="item">
<a href="ONE">
<img src="TWO">
</a>
</div>
I want to use XPath to pull out "ONE" and "TWO" from there.
The code I have right now is:
$html = file_get_contents($_POST['url']);
$document = new DOMDocument();
$document->loadHTML ($html);
$selector = new DOMXPath($document);
$query = '//div[#class="item"]';
$anchors = $selector->query($query);
foreach ($anchors as $node) {
// print ONE;
// print TWO;
}
Here comes an example:
$html = <<<EOF
<div class="item">
<a href="ONE">
<img src="TWO">
</a>
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
$links = $selector->query(
'//div[#class="item"]//#href | //div[#class="item"]//#src'
);
foreach($links as $link) {
echo $link->nodeValue . PHP_EOL;
}
If you want to break it down by <div class="item"> you can use the following code:
foreach($selector->query('//div[#class="item"]') as $div) {
foreach($selector->query('.//#href | .//#src', $div) as $link) {
echo $link->nodeValue . PHP_EOL;
}
}

Remove all list elements except the first in PHP?

How do I remove all the li elements except the first in PHP?
<div class="category">
<ul class="products">
<li>{nested child elements}</li>
<li>{nested child elements}</li>
<li>{nested child elements}</li>
</ul>
</div>
The code above is generated by another script via a function.
The result should be like this:
<div class="category">
<ul class="products">
<li>{nested child elements}</li>
</ul>
</div>
UPDATE: Sorry guys the "category" is a class not an ID.
In repay to Yoshi, ul.products has siblings but I didn't include them in my post. Would that affect the query?
This is how my code looks like with Yoshi's code added:
class Myclass {
function prin_html() {
$content = get_code();
$dom = new DOMDocument;
$dom->loadXml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//li[position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
}
}
It sill prints the non-filtered html code...
Try:
$dom = new DOMDocument;
$dom->loadHtml('<div id="category">
<ul class="products">
<li>{nested child elements}</li>
<li>{nested child elements}</li>
<li>{nested child elements}</li>
</ul>
</div>');
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//li[position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
$output = '';
foreach ($xpath->query('//body/*') as $child) {
$output .= $dom->saveXml($child);
}
Output:
<div id="category">
<ul class="products">
<li>{nested child elements}</li>
</ul>
</div>
You could use DOMDocument.
$dom = new DOMDocument;
$dom->loadHTML($html);
$ul = $dom->getElementById('category')->getElementsByTagName('ul')->item(0);
foreach($ul->getElementsByTagName('li') as $index => $li) {
if ($index == 0) {
continue;
}
$ul->removeChild($li);
}

Categories