Xpath query assistance nested anchor href extract - php

Here is my code:
$text = '<div class="cgus_post">Hello</div>';
$dom = new DomDocument();
$dom->loadHTML($text);
$classname = 'cgus_post';
$finder = new DomXPath($dom);
$nodes = $finder->query('//div[class="cgus_post"]//#href');
I'm trying to get the href text for an anchor link within the div cgus_post. What's wrong with my query?

probably a missing "#"
'//div[#class="cgus_post"]//#href'

XPath is incorrect. It should be:
//div[#class="cgus_post"]/a
Then $nodes will be a list of all <a> tags inside a <div class="cgus_post">, and you get their hrefs with
foreach($nodes as $node) {
$href = $node->getAttribute('href');
}

Related

PHP DomDocument get anchor tag href and inner html?

The below code get's all the anchor tags and inner text on a page. What if the inner html is an image?
For example <img src="test.png"/> How do i get the src of image?
$url = $_POST['url'];
$html = file_get_contents($url);
$dom = new DOMDocument;
#$dom->loadHTML($html);
//Get all links.
$links = $dom->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
$href = $link->getAttribute('href');
$text = $link->nodeValue;
}
How do i solve?
The DOMXPath class is very suitable for such problems.
$html = '<img src="test.png"/>Click me!';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//a/img/#src');
//test output
foreach($nodes as $node){
echo $node->nodeName." : ".$node->textContent.'<br>';
}
Output:
src : test.png
or if the href attributes are also required use
$nodes = $xPath->query('//a/img/#src|//a/#href');
and you get
href : test.html
src : test.png
href : example.com
Another variant:
$nodes = $xPath->query('//a/img/#src|//a[not(img)]/text()');
Result:
src : test.png
#text : Click me!

get value of href inside of div from external site using PHP

good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>

How to query a DOMNode using XPath in PHP?

I'm trying to get the bing search results with XPath. Here is my code:
$html = file_get_contents("http://www.bing.com/search?q=bacon&first=11");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//li[#class='b_algo']") as $node)
{
//$output[] = $node->getAttribute("href");
$tmpDom = new DOMDocument();
$tmpDom->loadHTML($node);
$tmpDP = new DOMXPath($tmpDom);
echo $tmpDP->query("//div[#class='b_title']//h2//a//href");
}
return $output;
This foreach iterates over all results, all I want to do is to extract the link and text from $node in foreach, but because $node itself is an object I can't create a DOMDocument from it. How can I query it?
First of all, your XPath expression tries to match non-existant href subelements, query #href for the attribute.
You don't need to create any new DOMDocuments, just pass the $node as context item:
foreach ($x->query("//li[#class='b_algo']") as $node)
{
var_dump( $x->query("./div[#class='b_title']//h2//a//#href", $node)->item(0) );
}
If you're just interested in the URLs, you could also query them directly:
foreach ($x->query("//li[#class='b_algo']/div[#class='b_title']/h2/a/#href") as $node)
{
var_dump($node);
}

PHP DOMDocument Anchor Tags

I am using DOMDocument to parse all anchor tags from a string of HTML. I need to store all the anchors which do not contain a certain href into an array. Right now I am able to loop through all the anchors and filter out the correct ones but I cannot store the original anchor. I can access the href and text values by doing things like $node->getAttribute(‘href’) but how do I get the anchor in its original form like Some Text Thanks! Here is the code I have now:
$dom = new DOMDocument();
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$anchors = array();
foreach ($dom->getElementsByTagName('a') as $node) {
if(strpos($node->getAttribute('href'), 'some value') !== true){
$anchors[] = $node; // TODO: need to store the entire original anchor tag
}
}
Try this:
if(strpos($node->getAttribute('href'), 'some value') !== true){
$temp = new DOMDocument();
$temp->appendChild($temp->importNode($node, true));
$node = $temp->saveHTML();
//var_dump($node);
$anchors[] = $node;
}

get value of <h2> of html page with PHP DOM?

I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?
if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2
You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>
You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}

Categories