Here I have a very simple code to grab all the 'div' elements with the classname 'info_block'. I am wondering how would I go about finding another element with the classname 'price' from within 'info_block' and display it instead of the whole 'info_block' element.
Main Goal: Find the price in each element with classname 'info_block'. but do inside the foreach, because I may need to find other elements.
<?php
$page = file_get_contents('example.com');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
//echo $dom->saveHTML($var1);
}
?>
There is a element in each of the 'info_block' with a classname 'price' and I would like to display only that element. Like so...
foreach ($div1 as $var1){
$dom2 = new DOMDocument();
$dom2->loadHTML($dom->saveHTML($var1));
$xpath2 = new DOMXPath($dom2);
$div2 = $xpath2->query('//div[#class="price"]');
$div2 = $div2->item(0);
echo $dom2->saveHTML($div2);
}
But instead of just giving me the price it returns the whole HTML for 'info_block' as it did before.
You could provide each <div class="info_block"> found and search for <div class="price">" by providing it in the second argument of ->query():
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
$div2 = $xpath->query('./div[#class="price"]', $var1);
// ^ each div
$div2 = $div2->item(0);
echo $dom->saveHTML($div2);
}
Note: You do not need to create another instance of DOM and DOMXpath.
This example is taken into context of this kind of HTML semantic:
<div class="info_block"> // each info block
<div class="price">1</div> // inside of it has price
</div>
<div class="info_block">
<div class="price">2</div>
</div>
You can combine queries in XPath to find all the desired elements in one go
$xpath->query('//div[#class="info_block"]|//div[#class="price"]');
You can specify dom elements for doing relative XPath queries. Its optional in xpath->query method
<?php
$page = file_get_contents('example.com');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
$div2 = $xpath2->query('//a[#class="price"]', $var1);
foreach ($div2 as $var2) {
echo $var2->nodeValue. "\n";
}
}
?>
For more you can see xpath documentation here
xpath query documentation
Related
I'm trying to pull out some datas using the DOM Parser technique.
My code :
<?php
// create new DOMDocument
$document = new \DOMDocument('1.0', 'UTF-8');
// set error level
$internalErrors = libxml_use_internal_errors(true);
$data = '<div id="show">
<ul class="browse_in_widget_col">
<li>
<a href="accounting/">
Accounting
</a>
<span>
(7420)
</span>
</li>
</div>';
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$makes = $xp->query('//ul[#class="browse_in_widget_col"]/ul');
$makeList = [];
foreach ( $makes as $make ) {
$makeList[] = $make->textContent;
}
print_r($makeList);
?>
Here i want to pull out the between the element <a> tag.
Example here i need Accounting from this element. How i can do that ?
Help me to get all the values in the a tag. Now I'm getting the empty array
In your XPath expression, you are looking for a nested <ul> tag, which there isn't one. If you just want the contents of the <a> tags, you can change the query to //ul[#class="browse_in_widget_col"]//a.
$xp = new DOMXPath($dom);
$makes = $xp->query('//ul[#class="browse_in_widget_col"]//a');
$makeList = [];
foreach ( $makes as $make ) {
$makeList[] = trim($make->textContent);
}
I've also added trim() to the output to remove any whitespace.
Can you echo the results of a document parser or do you have to first create an array to display the results? Anyway, when running the code, nothing appears (no output or errors), and I have tried both methods. Could possibly be a site issue but I have tried a few others and get the same result.
<?php
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#id="lvtitle"]/a');
foreach($links as $a) {
echo $a->nodeValue;
}
?>
There are a couple of problems with the code. Firstly is that loadHTML() takes a string for the HTML and not a filename or URI. So first you have to read the web page and pass it in ( I've used file_get_contents() here).
Secondly, the XPath was looking for any <h3> tag with an id attribute of lvtitle, there are only instances where the class attribute is lvtitle. I've updated the XPath expression to use this instead.
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$ebayhtml = file_get_contents($ebayhtml);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#class="lvtitle"]/a');
print_r($links);
foreach($links as $a) {
echo $a->nodeValue.PHP_EOL;
}
good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>
I am working on a script which is getting data from HTML DOM elements.
Here is my code:
$url = 'http://www.sportsdirect.com/nike-satire-mens-skate-shoes-242188?colcode=24218822';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$Name = $xpath->query('//span[#id="ProductName"]')->item(0)->nodeValue;
echo $Name;
This code is simply taking the text inside <span id="ProductName"></span>. I know how to get the data from elements with specific class or id.
I don't know how I can get the src="http://adres-to-image.com/img.png" (pure example) from image tag or how I can get elements which do not have id or class but have attribute like itemprop, for example <div itemprop="name"></div>
How can I get the image src?
How can I get elements with itemprop?
For your examples:
$xpath->query('//img/#src)->item(0)->nodeValue
This means
Select all src attributes of all img tags and get the value of the first
$xpath->query('//div/[#itemprop="name"])->item(0)->nodeValue
This means
Select all divs with itemprop attr equals name and get the value of the first.
You just look for the attributes:
$url = 'http://www.sportsdirect.com/nike-satire-mens-skate-shoes-242188?colcode=24218822';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$Name = $xpath->query('//div[#class="productImageSash"]');
foreach($Name as $element){
$imgs = $element->getElementsByTagName('img');
foreach($imgs as $img){
$src = $img->getAttribute('src');
echo $src;
}
}
Output:
/images/sash/productsash_mustgo.png
The same with itemprop attribute, look for divs which have this attribute:
$Name = $xpath->query('//div');
foreach($Name as $element){
$itemprop = $element->getAttribute('itemprop');
if($itemprop){
echo "found";
}
}
I'm trying to fetch the content of a div in a html page using xpath and domdocument. This is the structure of the page:
<div id="content">
<div class="div1"></div>
<span class="span1></span>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div class="div2"></div>
</div>
I want to get only the content of p, not spans and divs. I came thru this xpath expression .//*[#id='content']/p but guess something's not right because i'm getting only the first p. Tried using other expression with following-sibling and node() but all return the first p only.
.//*[#id='content']/span/following-sibling::p
.//*[#id='content']/node()[self::p]
This is how's used xpath:
$domDocument=new DOMDocument();
$domDocument->encoding = 'UFT8';
$domDocument->loadHTML($page);
$domXPath = new DOMXPath($domDocument);
$domNodeList = $domXPath->query($this->xpath);
$content = $this->GetHTMLFromDom($domNodeList);
And this is how i get html from nodes:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
$node = $domNodeList->item(0);
foreach($node->childNodes as $childNode)
$domDocument->appendChild($domDocument->importNode($childNode, true));
return $domDocument->saveHTML();
}
This XPath expression:
//div[#id='content']/p
Result in the wanted node set (five p elements)
EDIT: Now it's clear what is your problem. You need to iterate over the NodeList:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
foreach ($nodelist as $node) {
$domDocument->appendChild($domDocument->importNode($node, true));
}
return $domDocument->saveHTML();
}