The page on another of my domains which I'd like to scrape one div from contains:
<div id="thisone">
<p>Stuff</p>
</div>
<div id="notthisone">
<p>More stuff</p>
</div>
Using this php...
<?php
$page = file_get_contents('http://thisite.org/source.html');
$doc = new DOMDocument();
$doc->loadHTML($page);
foreach ($doc->getElementsByTagName('div') as $node) {
echo $doc->saveHtml($node), PHP_EOL;
}
?>
...gives me all divs on http://thisite.org/source.html, with html. However, I only want to pull through the div with an id of "thisone" but using:
foreach ($doc->getElementById('thisone') as $node) {
doesn't bring up anything.
$doc->getElementById('thisone');// returns a single element with id this one
Try $node=$doc->getElementById('thisone'); and then print $node
On a side note, you can use phpQuery for a jquery like syntext: pq("#thisone")
$doc->getElementById('thisone') returns a single DOMElement, not an array, so you can't iterate through it
just do:
$node = $doc->getElementById('thisone');
echo $doc->saveHtml($node), PHP_EOL;
Look at PHP manual http://php.net/manual/en/domdocument.getelementbyid.php
getElementByID returns an element or NULL. Not an array and therefore you can't iterate over it.
Instead do this
<?php
$page = file_get_contents('example.html');
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById('thisone');
echo $doc->saveHtml($node), PHP_EOL;
?>
On running
php edit.php you get something like this
<div id="thisone">
<p>Stuff</p>
</div>
Related
Lets say I have this code. I want to fetch all p tag data from nested div tag. there can be 15 nested div tag. so want to write a script which can dig all the div and return p tag data from it.
<div>
<div>
<div>
<p>Hi</p>
</div>
<p>Hello</p>
</div>
<p>Hey</p>
</div>
required output(any order):
Hi
Hello
Hey
I have attempted the following:
function divDigger($div)
{
$internalP = $div->getElementsByTagName('p');
echo $internalP->innertext;
$internalDiv = $div->getElementsByTagName('div');
if (count($internalDiv) > 0) {
foreach ($internalDiv as $div) {
divDigger($div);
}
}
}
You may use the XPath API for this:
$doc = new \DOMDocument();
$doc->loadHTML($yourHtml);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query('//div//p') as $pWithinDiv) {
echo $pWithinDiv->textContent, PHP_EOL;
}
This will find any <p> element under a <div> (not necessarily directly under it, otherwise you can change the expression to //div/p), and display its text content.
Demo: https://3v4l.org/43QqX
I want to append a variable containing HTML text in php to a preloaded div element in the same file. I am using simpler examples to try and achieve what I want.
<?php
$htmlString = "<p>Hello World!</p>";
?>
$htmlString is generated from a PHP function so I just want to put a sample html code to mimic HTML code. Iam trying to put $htmlString in the div element
<div id="demo"><h1>Test</h1></div>
I have tried the following but it does not work:
<?php
$dom = new domDocument;
$dom->loadHTML($html);
$div_tag = $dom->getElementById('demo');
echo $dom->saveHTML($div_tag);
?>
I want to produce this output:
<div id="demo"><h1>Test</h1><p>Hello World!</p></div>
You can call php in between html-tags:
<div id="demo"><h1>Test</h1><?php echo $htmlString ?></div>
I have layout like this:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
First I get query from xpath :
$a = $xpath->query("//div[#class='fly']""); //to get all elements in class fly
foreach ($a as $p) {
$t = $p->getElementsByTagName('img');
echo ($t->item(0)->getAttributes('data-original'));
}
When I run the code, it will produced 0 result. After I trace I found that <img class="badge"> is processed first. I want to ask, how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
Thank you,
Alernatively, you could use another xpath query on that to add on your current code.
To get the attribute, use ->getAttribute():
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('./img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('./div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('./div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
Sample Output
Thank you for your code!
I try the code but it fails, I don't know why. So, I change a bit of your code and it works!
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('**descendant::**img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('**descendant::**div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('.//div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
I have no idea what is the difference between ./ and descendant but my code works fine using descendant.
given the following XML:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
you asked:
how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
With XPath you can obtain the values as string directly:
string(//div[#class='fly']/img/#data-original)
This is the string from the first data-original attribute of an img tag within all divs with class="fly".
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])
These are the string values of first and second <h4> tag that is not followed on it's own level by another <h4> tag within all divs class="fly".
This looks a bit like standing in the way right now, but with iteration, those parts in front will not be needed any longer soon because the xpath then will be relative:
//div[#class='fly']
string(./img/#data-original)
string(.//h4[not(following-sibling::*//h4)][1])
string(.//h4[not(following-sibling::*//h4)][2])
To use xpath string(...) expressions in PHP you must use DOMXPath::evaluate() instead of DOMXPath::query(). This would then look like the following:
$aye = $xpath->evaluate("string(//div[#class='fly']/img/#data-original)");
$h4_1 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])");
$h4_2 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])");
A full example with iteration and output:
// all <div> tags with class="fly"
$divs = $xpath->evaluate("//div[#class='fly']");
foreach ($divs as $div) {
// the first data-original attribute of an <img> inside $div
echo $xpath->evaluate("string(./img/#data-original)", $div), "<br/>\n";
// all <h4> tags anywhere inside the $div
$h4s = $xpath->evaluate('.//h4[not(following-sibling::*//h4)]', $div);
foreach ($h4s as $h4) {
echo $h4->nodeValue, "<br/>\n";
}
}
As the example shows, you can use evaluate as well for node-lists, too. Obtaining the values from all <h4> tags it not with string() any longer as there could be more than just two I assume.
Online Demo including special string output (just exemplary):
echo <<<HTML
{$xpath->evaluate("string(//div[#class='fly']/img/#data-original)")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])")}<br/>
<hr/>
HTML;
here is the div code on different domains, i want to display total on my homepage. I try to use the file_get_html but it displays all the div content, but i want to save the number within the <dd></dd> in a variables and add them and display them on my page.
here is the div code
<div class="stats">
<dl class="statscount">
<dt>total:</dt>
<dd>5,299</dd>
</dl>
20000
</div>
and here is my current code.
<?php
include 'simple_html_dom.php';
$html = file_get_html('http://www.targetdomain.com');
$result = $html->find('dl[class=statscount]', 0); //Output: THESE
$result = str_replace(",", "", $result);
echo $result;
?>
but there is small problem i don't need to fetch all the data in the class, i just need data for <dd></dd> tag within the class, Can you please tell me how to achieve this. basically i want to fetch the number within the <dd>5,299</dd> and add all the numbers from different pages and display the total on my website. Thanks
I would use XPath for this, this way you won't need simple_html_dom because DOM and XPath is part of the PHP5 core:
$html = <<<EOF
<div class="stats">
<dl class="statscount">
<dt>total posts:</dt>
<dd>5,299</dd>
</dl>
20000
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
$value = $selector
->query('//dl[#class="statscount"]/dd/text()')
->item(0)
->nodeValue;
var_dump($value); // Output: string(5) "5,299"
You can test the code here
Maybe a regex
preg_match('/<dd>[^>]*(.*)<\/dd>/', $htmlcode, $matches);
$result = $matches;
I would like to place a new node element, before a given element. I'm using insertBefore for that, without success!
Here's the code,
<DIV id="maindiv">
<!-- I would like to place the new element here -->
<DIV id="child1">
<IMG />
<SPAN />
</DIV>
<DIV id="child2">
<IMG />
<SPAN />
</DIV>
//$div is a new div node element,
//The code I'm trying, is the following:
$maindiv->item(0)->parentNode->insertBefore( $div, $maindiv->item(0) );
//Obs: This code asctually places the new node, before maindiv
//$maindiv object(DOMNodeList)[5], from getElementsByTagName( 'div' )
//echo $maindiv->item(0)->nodeName gives 'div'
//echo $maindiv->item(0)->nodeValue gives the correct data on that div 'some random text'
//this code actuall places the new $div element, before <DIV id="maindiv>
http://pastie.org/1070788
Any kind of help is appreciated, thanks!
If maindiv is from getElementsByTagName(), then $maindiv->item(0) is the div with id=maindiv. So your code is working correctly because you're asking it to place the new div before maindiv.
To make it work like you want, you need to get the children of maindiv:
$dom = new DOMDocument();
$dom->load($yoursrc);
$maindiv = $dom->getElementById('maindiv');
$items = $maindiv->getElementsByTagName('DIV');
$items->item(0)->parentNode->insertBefore($div, $items->item(0));
Note that if you don't have a DTD, PHP doesn't return anything with getElementsById. For getElementsById to work, you need to have a DTD or specify which attributes are IDs:
foreach ($dom->getElementsByTagName('DIV') as $node) {
$node->setIdAttribute('id', true);
}
From scratch, this seems to work too:
$str = '<DIV id="maindiv">Here is text<DIV id="child1"><IMG /><SPAN /></DIV><DIV id="child2"><IMG /><SPAN /></DIV></DIV>';
$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName("div");
$divs->item(0)->appendChild($doc->createElement("div", "here is some content"));
print_r($divs->item(0)->nodeValue);
Found a solution:
$child = $maindiv->item(0);
$child->insertBefore( $div, $child->firstChild );
I don't know how much sense this makes, but well, it worked.