PHP DOM Get Links Inside DIV - php

I'm attempting to iterate thru DIV's and get all of the links from each DIV. I'd put this is an array, i.e.:
[Astronomy] // div #class=container
[link] http://www.nasa.gov
[link] http://www.seti.org
[Biology] // div #class=container
[link] http://www.biology.com
[Chemistry] // div #class=container
[link] http://www.chemistry.com
I can use DOM to get the text of the content inside the DIV's, but I can't figure out how to get the HREF Attribute of nodes inside the DIV. getAttribute isn't a method of Node. How can I iterate thru elements ('a') inside of an existing xpath?
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("*/div[#class='container']");
foreach($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
// ??? $links = $dom_xpath->query("//a");
}
}

You should try and use $element->getElementsByTagName('a') instead of using $element->childNodes.

Related

Extract all the 'a' tags with in which 'img' tag resides, using php but i am not able to figure it out

Here is the code snipet being used:
$urlContent = file_get_contents('http://www.techeblog.com/');
$dom = new DOMDocument();
#$dom->loadHTML($urlContent);
$domPath=new DOMXpath($dom);
$linkList = $domPath->evaluate("/html/body/a/img");
foreach ($linkList as $link)
{
echo $link->getAttribute("src")."<br />";
}
Need to extract all the links in which the child node is an image tag.
Your XPath expression will only return image tags that are inside links that are direct children of the body tag. If you want all link tags that contain images anywhere in the document, use the expression //a[img]
That being said, you may want to be more specific about which images you pull. This expression will limit the results to links containing images that are inside the blog entries //div[#class="entry"]//a[img].
Here is a great XPath cheat sheet.
<?php
$urlContent = file_get_contents('http://www.techeblog.com/');
$dom = new DOMDocument();
#$dom->loadHTML($urlContent);
$domPath=new DOMXpath($dom);
$linkList = $domPath->evaluate('//div[#class="entry"]//a[img]');
foreach ($linkList as $link)
{
echo $link->getAttribute("href").PHP_EOL;
}
Also, your echo is looking for an attribute calles src, which will not be present in the links.

Remove specific class domdocument

I have this in dom document:
<div><span class="hello"></span><div>
And I want to remove all spans with that class. So I tried:
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//span[#class="hello"]/..');
foreach ($elements as $el) {
$el->parentNode->removeChild($el);
}
But this removes the parent element as well (the div element). How can I only remove the span elements?
The /.. at the end of your XPath selector is selecting the parent element, not the <span> itself - .. means to work one level back up the tree, the same as in a directory path. So in your loop, where you say parentNode->removeChild, you're actually removing the div, since $el is already the span's parent element.
If you just remove the /.. from the end of the selector, the code should work as intended.
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//span[#class="hello"]');
foreach ($elements as $el) {
$el->parentNode->removeChild($el);
}
Full example: https://3v4l.org/o4dRv

Get next 17 letters after keyword

I have this keyword: yt-lookup-title.
I want the next 17 letters after this in a variable. So I would have:
"<a href="/watch?v=HnlC81tWoY8"
How can I archive that I get it from all lines with this Keyword?
Keywords
If you want to get the href content, you can rely on domdocument.
If I'm not mistaken, all the links (<a>) have this class yt-uix-tile-link. So you can do the following:
$dom = new DOMDocument;
// $html is a string containing the html of the page you're parsing
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
links = array ();
$nodes = $xpath->query('//a[#class="yt-uix-tile-link"]/#href');
foreach ($nodes as $node) {
$links [] = $node->nodeValue;
}
var_dump ($links);
Hope that helps

Get just the first item with DOMDocument in PHP

I am using this below code to get the elements that are in special HTML element :
$dom = new DOMDocument();
#$dom->loadHTML($google_html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//span[#class="st"]');
foreach ($tags as $tag) {
echo $node_value;
}
Now, the problem is that, the code gives all of the elements that are in one special class, but i just need to get the First item that has that class name.
So i don't need using foreach loops.
How to use that code to get JUST the FIRST item ?
The following will make sure you get just the first one in the DOMNodeList that is returned
$xpath->query('//span[#class="st"][1]');
The following gets the only item in the DOMNodeList
$tags = $xpath->query('//span[#class="st"][1]');
$first = $tags->item(0);
$text = $first->textContent;
See XPath: Select first element with a specific attribute

convert part of dom element to string with html tags inside of them

im in need of converting part of DOM element to string with html tags inside of them.
i tried following but it prints just a text without tags in side.
$dom = new DOMDocument();
$dom->loadHTMLFile('http://www.pixmania-pro.co.uk/gb/uk/08920684/art/packard-bell/easynote-tm89-gu-015uk.html');
$xpath = new DOMXPath($dom);
$elements=xpath->query('//table');
foreach($elements as $element)
echo $element->nodeValue;
i want all the tags as it is and the content inside tables. can some one help me. it'll be a greate help.
thanks.
Current solution:
foreach($elements as $element){
echo $dom->saveHTML($element);
}
Old answer (php < 5.3.6):
Create new instance of DomDocument
Clone node (with all sub nodes) you wish to save as HTML
Import cloned node to new instance of DomDocument and append it as a child
Save new instance as html
So something like this:
foreach($elements as $element){
$newdoc = new DOMDocument();
$cloned = $element->cloneNode(TRUE);
$newdoc->appendChild($newdoc->importNode($cloned,TRUE));
echo $newdoc->saveHTML();
}
With php 5.3.6 or higher you can use a node in DOMDocument::saveHTML:
foreach($elements as $element){
echo $dom->saveHTML($element);
}

Categories