html DOM program to find href value - php

I am a newbie in php and I have been assigned with a project to fetch the HREF value from the following HTML snippet:
<p class="title">
<a href="http://canon.com/">Canon Pixma iP100 + Accu Kit
</a>
</p>
Now for this am using the following code:
$dom = new DOMDocument();
#$dom->loadHTML($html);
foreach($dom->getElementsByTagName('p') as $link) {
# Show the <a href>
foreach($link->getElementsByTagName('a') as $link)
{
echo $link->getAttribute('href');
echo "<br />";
}
}
This code gives me the HREF value of all <a href> from all the <P> tag in that page. I want to parse the <P> with the class "title" only...I can't use Simple_HTML_DOM or any kind of library here.
Thanks in advance.

Alternatively, you could use DOMXpath for this one. Like this:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
// target p tags with a class with "title" with an anchor tag
$target_element = $xpath->query('//p[#class="title"]/a');
if($target_element->length > 0) {
foreach($target_element as $link) {
echo $link->getAttribute('href'); // http://canon.com/
}
}
Or If if you want to traverse it. Then you need to have to search it manually.
foreach($dom->getElementsByTagName('p') as $p) {
// if p tag has a "title" class
if($p->getAttribute('class') == 'title') {
foreach($p->childNodes as $child) {
// if has an anchor children
if($child->tagName == 'a' && $child->hasAttribute('href')) {
echo $child->getAttribute('href'); // http://cannon.com
}
}
}
}

Related

DOMDocument Check if html code is in <h2> tag

I have a function that get all <h2> using DOMDocument,
Now I want to check if there is any HTML tag between <h2>[here]</h2>, don't get the <h2> and skip to next.
My Code:
foreach ($DOM->getElementsByTagName('*') as $element) {
if ($element->tagName == 'h2') {
$h = $element->textContent;
}
}
I think the easiest thing is to just reuse getElementsByTagName("*") on the element and count how many items are found.
$html = <<<EOT
<html><body><h2>Hello</h2> <h2>World</h2><h2><strong>!</strong></h2></body></html>
EOT;
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('h2') as $h2) {
if(!count($h2->getElementsByTagName('*'))){
var_dump($h2->textContent);
}
}
Demo here: https://3v4l.org/dI1e4

get value of href inside of div from external site using PHP

good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>

How can i get the text from a child node with php DOMDocument

I've been writing a php code to get information from a site, so far i was able to get the href attribute, but i cant find a way to get the text from the child node "span", can someone help me?
html- >
<a class="js-publication" href="publication/247931167">
<span class="publication-title">An approach for textual authoring</span>
</a>
This is how i am currently able to get the href ->
#$dom->loadHTMLFile($curPage);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$class_ = $element->getAttribute('class');
if (0 !== strpos($class_, 'js-publication')) {
$href = $element->getAttribute('href');
if(0 === stripos($href,'publication/')){
echo $href;//link para a publicação;
echo "\n";
}
}
}
You can use DOMXpath
$html = <<< LOL
<a class="js-publication" href="publication/247931167">
<span class="publication-title">An approach for textual authoring</span>
</a>
LOL;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//a[#class='js-publication']") as $element){
echo $element->getAttribute('href');
echo $element->textContent;
}
//publication/247931167
//An approach for textual authoring
Or without the for loop, if you just want one element :
echo $xpath->query("//a[#class='js-publication']/span")[0]->textContent;
echo $xpath->query("//a[#class='js-publication']")[0]->getAttribute('href');
Ideone Demo

Use php Get links with attribute from a html file

Here is my HTML
home
home
home
And Php
Am trying to get all a tags with attribute "title" but it dosnt work, this is what i have tried.
$html = file_get_contents('home.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link)
{
if ($link->getAttribute('name') == "title")
{
echo $link->getAttribute('href'). ' ';
echo $link->nodeValue. '<p>';
}
}
but it shows a blank Data. how to i fix it, need help
getAttribute extracts the value of a named attribute, e.g.:
<a href="foo.html" name="bar">
$node->getAttribute('href'); // returns "foo.html"
You want
$node->hasAttribute('title');
e.g.
<a href="foo.html"> $node->hasAttribute('name') -> false
<a href="foo.html" name="foo"> $node->hasAttribute('name') -> true

Extracting href attribute and the value using php dom parser

From the given markup i have to extract the hyperlink and the ALL title of hyperlink
<span></span>
<span>Chapter1</span>
<span>Chapter2</span>
<span>Chapter3</span>
for this i've written follwing code but its not working
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
foreach ($tags as $tag) {
echo '\n'.$tag->nodeValue;
if($tag->hasChildNodes()) {
echo $tag->childNodes->getAttribute('href');
} else {
echo 'default.htm';
}
}
i am expecting this output:
Chapter1 default.htm
Chapter2 page2.htm
Chapter3 page3.htm
and so on
Could you please try this ?
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
for($i=0;$i<$tags->length;$i++){
echo $tags->item($i)->nodeValue;
if($tags->item($i)->hasChildNodes()) {
if($tags->item($i)->firstChild->nodeName=='a'){
echo " ".$tags->item($i)->firstChild->getAttribute('href').'<br/>';
}else{
echo " default.htm<br/>";
}
}
}

Categories