Use php Get links with attribute from a html file - php

Here is my HTML
home
home
home
And Php
Am trying to get all a tags with attribute "title" but it dosnt work, this is what i have tried.
$html = file_get_contents('home.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link)
{
if ($link->getAttribute('name') == "title")
{
echo $link->getAttribute('href'). ' ';
echo $link->nodeValue. '<p>';
}
}
but it shows a blank Data. how to i fix it, need help

getAttribute extracts the value of a named attribute, e.g.:
<a href="foo.html" name="bar">
$node->getAttribute('href'); // returns "foo.html"
You want
$node->hasAttribute('title');
e.g.
<a href="foo.html"> $node->hasAttribute('name') -> false
<a href="foo.html" name="foo"> $node->hasAttribute('name') -> true

Related

get value of href inside of div from external site using PHP

good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>

How to replace getElementsByTagName() By document.getElementsById()

I have this code and I want to get a link of an image stored in a website by its Id but this code use getElementsByTagName('') :
<?php
$html = file_get_contents('http://example.com/dir/webpage.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('img');
foreach ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>
And The HTML is:
<a href="/images/image1.png" id="img_1_id">
<div class="download"></div>
</a>
I want to replace getElementsByTagName('img') with document.getElementsByById(img_1_id)
so the script get the url of the selected image with the id: img_1_id
If there another way / code to do this please post it :)
Thank you pros!
getElementById returns a single element, you don't need a loop.
$link = $dom->getElemebtById('img_1_id');
echo $link->nodeValue;
echo $link->getAttribute('href');
BTW, img elements don't have an href attribute, they have src. They also don't have anything in their nodeValue, since <img> is not a container element.
you have to put the "
document.getElementsByById("img_1_id");
sou you get the element with id = "img_1_id"
what about this?
<?php
$html = file_get_contents('http://example.com/dir/webpage.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementById('img_1_id');
foreach ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>

How can i get the text from a child node with php DOMDocument

I've been writing a php code to get information from a site, so far i was able to get the href attribute, but i cant find a way to get the text from the child node "span", can someone help me?
html- >
<a class="js-publication" href="publication/247931167">
<span class="publication-title">An approach for textual authoring</span>
</a>
This is how i am currently able to get the href ->
#$dom->loadHTMLFile($curPage);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$class_ = $element->getAttribute('class');
if (0 !== strpos($class_, 'js-publication')) {
$href = $element->getAttribute('href');
if(0 === stripos($href,'publication/')){
echo $href;//link para a publicação;
echo "\n";
}
}
}
You can use DOMXpath
$html = <<< LOL
<a class="js-publication" href="publication/247931167">
<span class="publication-title">An approach for textual authoring</span>
</a>
LOL;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//a[#class='js-publication']") as $element){
echo $element->getAttribute('href');
echo $element->textContent;
}
//publication/247931167
//An approach for textual authoring
Or without the for loop, if you just want one element :
echo $xpath->query("//a[#class='js-publication']/span")[0]->textContent;
echo $xpath->query("//a[#class='js-publication']")[0]->getAttribute('href');
Ideone Demo

How to get value of onclick= using xpath?

I have a string that has lots of <li> sets of data. I want to get this value:
1: call.php?category=fruits&fruitid=123456
inside onclick using xpath . My current xpath doesn't get me the onclick value so I parse it further to get my required data ! Could any one tell me what is the correct xpath to get value of onclick?
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($code2);
$xpath = new DOMXPath($dom);
// Empty array to hold all links to return
$result = array();
//Loop through each <li> tag in the dom
foreach($dom->getElementsByTagName('li') as $li) {
//Loop through each <a> tag within the li, then extract the node value
foreach($li->getElementsByTagName('a') as $links){
$result[] = $links->nodeValue;
echo $result[0] . "\n";
}
$onclicks = $xpath->query("//li/a/onclick");
foreach ($onclicks as $onclick) {
echo $onclick->nodeValue . "\n";
}
}
data:
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID('123456');getit('call.php?category=fruits&fruitid=123456&',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png">
</li>
onclick is an attribute, and you use #attribute_name to reference attribute in XPath :
$onclicks = $xpath->query("//li/a/#onclick");
foreach ($onclicks as $onclick) {
echo $onclick->nodeValue . "\n";
}
Try something like this :
$onclicks = $xpath->query("//li/a");
foreach ($links as $link) {
echo $link->getAttribute('onclick'). "\n";
}

html DOM program to find href value

I am a newbie in php and I have been assigned with a project to fetch the HREF value from the following HTML snippet:
<p class="title">
<a href="http://canon.com/">Canon Pixma iP100 + Accu Kit
</a>
</p>
Now for this am using the following code:
$dom = new DOMDocument();
#$dom->loadHTML($html);
foreach($dom->getElementsByTagName('p') as $link) {
# Show the <a href>
foreach($link->getElementsByTagName('a') as $link)
{
echo $link->getAttribute('href');
echo "<br />";
}
}
This code gives me the HREF value of all <a href> from all the <P> tag in that page. I want to parse the <P> with the class "title" only...I can't use Simple_HTML_DOM or any kind of library here.
Thanks in advance.
Alternatively, you could use DOMXpath for this one. Like this:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
// target p tags with a class with "title" with an anchor tag
$target_element = $xpath->query('//p[#class="title"]/a');
if($target_element->length > 0) {
foreach($target_element as $link) {
echo $link->getAttribute('href'); // http://canon.com/
}
}
Or If if you want to traverse it. Then you need to have to search it manually.
foreach($dom->getElementsByTagName('p') as $p) {
// if p tag has a "title" class
if($p->getAttribute('class') == 'title') {
foreach($p->childNodes as $child) {
// if has an anchor children
if($child->tagName == 'a' && $child->hasAttribute('href')) {
echo $child->getAttribute('href'); // http://cannon.com
}
}
}
}

Categories