How to extract the contents inside a div based on its class? - php

I tried with this code,
$html= file_get_contents("page.html");
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementsByClassName('mydiv1');
$result = $dom->saveHTML($div);
echo $result;
page.html
<html>
<body>
<div id="test">
<div class="mydiv1">Hello</div>
<div class="mydiv2">How are you</div>
</div>
</body>
</html>
But when I tried with Id its works. like,
$html= file_get_contents("page.html");
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementById('test');
$result = $dom->saveHTML($div);
echo $result;
How can I get the content based on class ?

Try this code,
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="mydiv1"]');
$div = $div->item(0);
$result = $dom->saveXML($div);
echo $result;

There is no actual getElementsByClassName (yet) in DOMDocument, but the same results can be produced using DOMXpath as :
$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$nodes= $finder->query('//div[#class="mydiv1"]');

Related

XPath extract attribute from <div> in PHP

i want to extract an attribute from an and display its value.
<div class="b-text-4xl b-text-btc-first b-font-bold btcecc-animated liveup livedown" data-price="38696.15125182" data-live-price="bitcoin" data-rate="1" data-currency="USD" data-timeout="1610051644181"><span>38,696.15</span> <b class="fiat-symbol">$</b></div>
I need the value of "data-price".
The location of the full html is at https://www.btc-echo.de/kurs/bitcoin/
I tried this:
$url = "https://www.btc-echo.de/kurs/bitcoin/";
libxml_use_internal_errors(true);
$doc = new DOMDocument;
$doc->loadHTML(utf8_encode(file_get_contents($url)));
$xpath = new DOMXpath($doc);
foreach ($xpath->query('*[#id="main"]/div[1]/div[3]/div[2]/div[1]/div/div/div[1]/div/#data-price') as $textNode) {
echo $textNode->nodeValue;
}

Get part go text with domDocument in PHP

I've this HTMl coming from a file_get_contents:
<div class="attractions-attraction-filtered-common-ListingsHeader__listingsCount--PflJ1">
<span>We found <b>10 results</b> for you.</span>
</div>
How can I get the number of results (i.e.: 10)?
Note, that the part PflJ1 is something random.
This is what I tried:
$page = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xp = new DOMXpath($dom);
$activitiesNb = $xp->query('//div/span/text()');
$activitiesNb = $activitiesNb->nodeValue;
echo $activitiesNb;
But it does not work.
What I'm missing please ?
Thanks.
Using evaluate():
$results = $xp->evaluate('string(//span[contains(., "We found ")]/b/text())');
Your XPath needs to be fixed (you're looking for the content of a b element). Also, use item() in combination with nodevalue.
<?php
$html = <<<'HTML'
<div class="attractions-attraction-filtered-common-ListingsHeader__listingsCount--PflJ1">
<span>We found <b>10 results</b> for you.</span>
</div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$res = $xpath->query('//div[starts-with(#class,"attractions-")]/span/b');
$val = $res ->item(0)->nodeValue;
echo substr($val,0,2)
?>
Output : 10
Alternative :
$res = $xpath->evaluate("substring(//div[starts-with(#class,'attractions-')]/span/b,1,2)");
echo $res
And if you just have one number (for e.g. 8) :
echo substr($val,0,1)
or
$res = $xpath->evaluate("substring(//div[starts-with(#class,'attractions-')]/span/b,1,1)");
echo $res

How to web-scrape in in divs with DOMparser

I am trying to get div and for other pages, trying to put it in a foreach.
But facing some troubles,
<div class="article_info">
<ul class="c-result_box">
<li>
<div class="inner cf">
<div class="c-header">
<div class="c-logo">
<im src="/e/designs/31sumai/common/img/logo_08.png" alt="#">
</div>
<p class="c-supplier">三井のマンション</p>
<p class="c-name">
パークリュクス大阪天満
</p>
I'm trying to get the text inside the <a> element, here is my codes, what I am missing here?
$start_id = 1501;
while(true){
$url = 'https://www.31sumai.com/mfr/K'.$start_id.'/outline.html';
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);
$classname="c-name";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$MyTable = false;
$insertData = [];
foreach($nodes as $node){
$allNames = [];
foreach($node->getElementsByTagName('a') as $a){
$name = $a->getElementsByTagName('a');
$allProperties[] = [
'names' => $name];
}
}
Thank you for helping!
You can rely on your XPath query to pull all the text node that you want, and then just get the nodeValue property within your loop:
$start_id = "1501";
$url = "https://www.31sumai.com/mfr/K$start_id/outline.html";
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);
$classname="c-name";
$nodes = $xpath->query("//*[contains(#class, '$classname')]/a/text()");
foreach($nodes as $node){
echo $node->nodeValue;
}

PHP Xpath Error already defined in Entity not showing results

I am getting errors in this php xpath app and i cannot fix, i would love some help if possible
<?php
//Get Username
$username = $_GET["u"];
$html = file_get_contents('http://us.playstation.com/publictrophy/index.htm?onlinename=' .$username);
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//*[#id="id-handle"]') as $node) {
echo $node, "\n";
}
foreach ($xpath->query('//*[#id="leveltext"]') as $node1) {
echo $node1, "\n";
}
?>
put # before $dom->loadHTML($html) because loadHTML usually rises a lot of warnings and notices
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

get value of <h2> of html page with PHP DOM?

I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?
if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2
You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>
You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}

Categories