PHP simple HTML Dom select inner text only not child inne rtext - php

I'm using PHP simple HTML DOM class to parse html.
I want to select div with id="content" inner text, but when I call $selecrot->plaintext, it also return sub div text
Sample HTML
<div id="content">
Hello World.
<div id="sub-content1">
Text I don't want to select.
</div>
<div id="sub-content2">
Text I don't want to select
</div>
</div>
Sample code
//suppose $html contains above html
$selector = $html->find("div#content", 0);
echo $selector->innertext;
//it outputs "Hello World. Text I don't want to select. Text I don't want to select"
//but
I want only "Hello World"

include_once('simple_html_dom.php');
$html = new simple_html_dom();
$text = '<div id="content">
Hello World.
<div id="sub-content1">
Text I don\'t want to select.
</div>
<div id="sub-content2">
Text I don\'t want to select
</div>
</div>';
$html->load($text);
$selector =$html->find("div#content",0)->find("*");
foreach($selector as $node){
$node->outertext = '';
}
$html->load($html->save());
$selector =$html->find("div#content",0);
echo $selector->innertext;

Related

Replace class content using php

I want to replace string from specific classes from HTML.
In HTML there is other content which I don't want to change.
In below code want to change data on class one and three only, class two content should be as it is.
I need to this in dynamic way.
<div class="one"> I want to change this </div>
<div class="two"> I don't want to change this </div>
<div class="three"> I want to change this </div> 
Dom functions are helpful
php manual
//your html file content
$str = '...<div class="one"> I want to change this </div>
<div class="two"> I don\'t want to change this </div>
<div class="three"> I want to change this </div>... ';
$dom = new DOMDocument();
$dom->loadHtml($str);
$domXpath = new DOMXPath($dom);
//query the nodes matched
$list = $domXpath->query('//div[#class!="two"]');
if ($list->length > 0) {
foreach ($list as $node) {
//change node value
$node->nodeValue = 'Content changed!';
}
}
//get the result
$new_str = $dom->saveHTML();
var_dump($new_str);

PHP + Simple HTML DOM: Select a tag inside a div with class inside another div with class

In PHP: Simple HTML DOM, How do I select all <strong> tag that are inside div with class abc, which are inside div with class 123:
<div class="123">
<div class="abc">
<strong>Text</strong>
</div>
</div>
You need to use a selector like div.123 div.abc strong and get the first element of the result. Here is a working example:
<?php
require 'simple_html_dom.php';
$html =<<<html
<div class="123">
<div class="abc">
<strong>Text</strong>
</div>
</div>
html;
$dom = str_get_html($html);
$el = $dom->find('div.123 div.abc strong', 0);
print $el;
print "\n";
print $el->innertext;
Result:
<strong>Text</strong>
Text
You can refer to the manual for a better understanding of how selectors work.

simple html dom to exclude paragraph with class

I have this code and i want first paragraph as output I tried to filter with paragraph but I am getting second paragraph
I am only interested in first paragraph text.
<div class="bq_fq_lrg" style="margin:0px">
<p>this text i want.</p>
<p class="bq_fq_a">
this text i dont want.
</p>
</div>
I tried this but it is giving second paragraph
foreach($html->find('div.bq_fq_lrg p[0]') as $e)
The $html variable is an instance of SimpleHtmlDom
I am getting the content of the paragraph like this:
$op1 = $e->innertext . '<br>';
You can use the ! in attributes to get that particular value. Consider this example:
include 'simple_html_dom.php';
$html_string = '<div class="bq_fq_lrg" style="margin:0px">
<p>this text i want.</p>
<p class="bq_fq_a">
this text i dont want.
</p>
</div>';
$html = str_get_html($html_string);
foreach($html->find('div.bq_fq_lrg p[!class]') as $value) {
echo $value->innertext; // this text i want.
}

Zend_Dom_Query multiple selects

I'm trying to use Zend_Dom_Query to get some specific content from a webpage.
I got a query working to get the content from one dom-element. Now i want to select a second dom-element to get this content also.
This is the html:
<div class="blocks">
<div class="w2">
<h2>Some title</h2>
<p>some text</p>
<p>more text</p>
<p class="more-info">link</p>
</div>
<div class="w2">
<h2>Some title</h2>
<p>some text</p>
<p>more text</p>
<p class="more-info">link</p>
</div>
</div>
My code so far:
$client = new Zend_Http_Client();
$client->setUri('http://awsomewebsite');
$result = $client->request('GET');
$response = $result->getBody();
$dom = new Zend_Dom_Query($response);
foreach ($dom->query('div.w2') as $content) {
echo $content->getElementsByTagName('h2')->item(0)->nodeValue; // this gives me the h2 value
echo $content->getElementsByTagName('a')->item(0)->getAttribute('href');
}
Now the problem is when there are more anchor links this solution isn't working. My question is: What is the correct way to select multiple elements? Or can i use a new query inside this foreach to select the right element?
$i=0;
foreach ($dom->query('div.w2') as $content) {
echo $content->getElementsByTagName('h2')->item($i)->nodeValue; // this gives me the h2 value
echo $content->getElementsByTagName('a')->item($i)->getAttribute('href');
$i++;
}

Count Similar Div : Simple html dom

I have a html layout like :
<div id="pageno">1</div>
<div id="pageno">2</div>
<div id="pageno">3</div>
<div id="pageno">4</div>
<div id="pageno">5</div>
I need to know using html dom parser how can i know the last div inner text?
THanks in advance
// Create a new DomDocument.
$dom = new DomDocument();
// Load your HTML into it.
$dom->loadHTML('
<div id="pageno">1</div>
<div id="pageno">2</div>
<div id="pageno">3</div>
<div id="pageno">4</div>
<div id="pageno">5</div>
');
// Obtain a list of the DIVs.
$divList = $dom->getElementsByTagName("div");
// Obtain the last element of the list.
$lastDiv = $divList->item($divList->length - 1);
// Output the inner text.
echo $lastDiv->nodeValue;
However, the HTML you have provided is not valid, as element IDs should be unique. This may cause an error in the loadHTML function.

Categories