getting element content with simpe-html-dom

getting element content with simpe-html-dom - php

I'm using simpile_html_dom for getting html pages elements.
I have some div elements like this. All i want is to get "Fine Thanks" sentence in each div (that is not inside any sub-element).
How can i do it?
<div class="right">
<h2>
Hello
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>

It should be simply $html->find('div.right > text'), but that won't work because Simple HTML DOM Parser doesn't seem to support direct descendant queries.
So you'd have to find all <div> elements first and search the child nodes for a text node. Unfortunately, the ->childNodes() method is mapped to ->children() and thus only returns elements.
A working solution is to call ->find('text') on each <div> element, after which you filter the results based on the parent node.
foreach ($doc->find('div.right') as $parent) {
foreach ($parent->find('text') as $node) {
if ($node->parent() === $parent && strlen($t = trim($node->plaintext))) {
echo $t, PHP_EOL;
}
}
}
Using DOMDocument, this XPath expression will do the same work without the pain:
$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);
foreach ($xp->query('//div/text()') as $node) {
if (strlen($t = trim($node->textContent))) {
echo $t, PHP_EOL;
}
}

There is no built in method to read text property in simple_html_dom.php
But this should work;
include 'parser.php';
$html = str_get_html('<div class="right">
<h2>
Hello
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>');
function readTextNode($element){
$local = $element;
$childs = count($element->childNodes());
for($i = 0; $i < $childs; $i++)
$local->childNodes($i)->outertext = '';
return $local->innertext;
}
echo readTextNode($html->find('div.right',0));

I would switch to phpquery for this one. You still need to use DOM but not too painful:
require('phpQuery.php');
$html =<<<EOF
<div class="right">
<h2>
Hello
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>
EOF;
$dom = phpQuery::newDocumentHTML($html);
foreach($dom->find("div.right > *:last") as $last_element){
echo $last_element->nextSibling->nodeValue;
}
Update
These days I'm recommending this simple replacement which does let you avoid the dom ugliness:
$doc = str_get_html($html);
foreach($doc->find('div.right > text:last') as $el){
echo $el->text;
}

public function removeNode($selector)
{
foreach ($html->find($selector) as $node)
{
$node->outertext = '';
}
$this->load($this->save());
}
use this function to remove the h2 and span element from the div. Then get the div element data.
Reference URL : Simple HTML Dom: How to remove elements?

Related

Replace class content using php

I want to replace string from specific classes from HTML.
In HTML there is other content which I don't want to change.
In below code want to change data on class one and three only, class two content should be as it is.
I need to this in dynamic way.
<div class="one"> I want to change this </div>
<div class="two"> I don't want to change this </div>
<div class="three"> I want to change this </div>

Dom functions are helpful
php manual
//your html file content
$str = '...<div class="one"> I want to change this </div>
<div class="two"> I don\'t want to change this </div>
<div class="three"> I want to change this </div>... ';
$dom = new DOMDocument();
$dom->loadHtml($str);
$domXpath = new DOMXPath($dom);
//query the nodes matched
$list = $domXpath->query('//div[#class!="two"]');
if ($list->length > 0) {
foreach ($list as $node) {
//change node value
$node->nodeValue = 'Content changed!';
}
}
//get the result
$new_str = $dom->saveHTML();
var_dump($new_str);

simple html dom traversal confusion when looping

I'm trying to use the php script simplehtmldom to loop over divs on a web page while scraping.
Right now I have this:
$url = "https://test.com/";
$html = new simple_html_dom();
$html->load_file($url);
$item_list = $html->find('div.main div[id]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}
This will give me many like this (from the echo in the loop above):
<div id=1>
<div>
stuff here
</div>
<div>
<span class="title">name</span>
</div>
</div>
<div id=2>
<div>
stuff here
</div>
<div>
<span class="title">name 2</span>
</div>
</div>
What I'm trying to do is loop over the span with class=title, but no matter what I can't seem to quite get the right selector. Could someone help me out?

You can get the spans adding span[class=title] as a selector:
$item_list = $html->find('div.main div[id] span[class=title]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}

Wired HTML DOM produced by PHP

I'm retrieving rss feed of blogs with this code
<?php
$xml = ("https://serembangirl.wordpress.com/feed/");
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$x=$xmlDoc->getElementsByTagName('item');
for ($i=0; $i<=5; $i++) {
$item_title=$x->item($i)->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
$item_link=$x->item($i)->getElementsByTagName('link')
->item(0)->childNodes->item(0)->nodeValue;
$item_desc=$x->item($i)->getElementsByTagName('description')
->item(0)->childNodes->item(0)->nodeValue;
$item_content=$x->item($i)->getElementsByTagName('encoded')->item(0)->nodeValue;
?>
<a href='#'>
<div class="card">
<div class='inner'>
<p class='title'>
<?php echo $item_title;?>
</p>
<p class='desc'> <?php echo $item_desc; ?> </p>
</div>
</div>
</a>
<?php } ?>
With above code, supposedly the should wrap the but it produced this instead :
http://i.imgur.com/YspeRe3.png
I really scratched my head solving this.

I think div within anchor tag is not recommended.

Check the actual source code that is generated by PHP. It will have the div inside the a.
div, p or other block level elements are not allowed inside an a element. The browser tries to "fix" your document.
Hint 1
Use XPath to fetch data from the DOM.
$xpath = new DOMXPath($xmlDoc);
foreach ($xpath->evaluate('//item') as $item) {
$item_title = $xpath->evaluate('string(title)', $item);
// ...
}
Hint 2
Don't forget the escaping if you output data as HTML source.
...
<p class='title'>
<?php echo htmlspecialchars($item_title); ?>
</p>
...

Fetching Image from particular div Only via DOMDocument in PHP

I have website, where i have posted few images inside particular div :-
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
And from my 2nd website, i want to fetch all images on that particular div.. I have below code.
<?php
$htmlget = new DOMDocument();
#$htmlget->loadHtmlFile('http://www.example.com');
$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/#src" );
foreach ($nodelist as $images){
$value = $images->nodeValue;
echo "<img src='".$value."' /><br />";
}
?>
But this is fetching all images from my website and not just particular div. It also prints out my RSS image, Social icon image, etc.,
Can i specify particular div in my php code, so that it only fetch image from div.posts class.

first give a "id" for the outer div container. Then get it by its id. Then get its child image nodes.
an example:
$tables = $dom->getElementsById('node_id');
$table = $tables->item(1);
//get the number of rows in the 2nd table
echo $table->childNodes->length;
//content of each child
foreach($table->childNodes as $child)
{
echo $child->ownerDocument->saveHTML($child);
}
may be this like will help you. It has a good tutorial.
http://www.binarytides.com/php-tutorial-parsing-html-with-domdocument/

With PHP Simple HTML Parser, this will be:
include('simple_html_dom.php');
$html=file_get_html("http://your_web_site.com");
foreach($html->find('div.posts img') as $img_posts){
echo $img_posts->src.<br>; // to show the source attribute
}
Still reading about PHP Simple HTML Dom parser. And so far, it's faster(in implementation) than regex.

Here is another code that may help. You are looking for
doc->getElementsByTagName
which can help target a tag directly.
<?php
$myhtml = <<<EOF
<html>
<body>
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
</body>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($myhtml);
$divs = $doc->getElementsByTagName('img');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
?>
Demo here http://codepad.org/keZkC377
Also the answer here can provide further insights
Not finding elements using getElementsByTagName() using DomDocument

PHP - GET tag from url

I want to get a specific tag from url, from example:
If I have this content:
<div id="hey">
<div id="bla"></div>
</div>
<div id="hey">
<div id="bla"></div>
</div>
And I want to get all divs with the id "hey", ( i think its with preg_match_all ), How can I do that?
The content inside the tag can be changed.

I recommend use DOMDocument class instead of regular expressions (is less resource consumer and more clear IMHO).
$content = '<div id="hey">
<div id="bla"></div>
</div>
<div id="hey">
<div id="bla"></div>
</div>';
$doc = new DOMDocument();
#$doc->loadHTML($content); // # for possible not standard HTML
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//div[#id='hey']");
/*#var $elements DOMNodeList */
for ($i=0;$i<$elements->length;$i++) {
/*#var $curr_element DOMElement */
$curr_element = $elements->item($i);
// Here do what you want with the element
var_dump($curr_element);
}
If you want to get the content from an URL you can use this line instead to fill the variable $content:
$content = file_get_contents('http://yourserver/urls/page.php');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

getting element content with simpe-html-dom - php

public function removeNode($selector) { foreach ($html->find($selector) as $node) { $node->outertext = ''; } $this->load($this->save()); } use this function to remove the h2 and span element from the div. Then get the div element data. Reference URL : Simple HTML Dom: How to remove elements?

Related

Replace class content using php

simple html dom traversal confusion when looping

Wired HTML DOM produced by PHP

Fetching Image from particular div Only via DOMDocument in PHP

PHP - GET tag from url

Categories

Resources