extracting and printing an html element by it's class using DOMDocument

extracting and printing an html element by it's class using DOMDocument - php

what i want to do is to get an element with its class name and show it as a actual html element not it nodes or its inner data
here is my code
$html = file_get_contents("www.site.com");
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
$element = $dom->getElementById('myid');
$string = $element->C14N();
here is how i do it using ID but i want to now if there is a way to do this using class apparently there is no getElementByClass method

There is no straightforward method in php dom to do this. You will have to walk all the elements and check if their class attribute contains the class name you need...
$html = file_get_contents("www.site.com");
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $element) {
if (strpos($element->getAttribute('class'), 'yourClassNameHere') !== false) {
$string = $element->C14N();
}
}
You can also use DOMXpath:
$xpath = new DOMXpath($doc);
foreach ($xpath->query("*/div[#class='yourClassNameHere']") as $element) {
$string = $element->C14N();
}

Related

Create attribute in PHP by selecting tag name

I am trying to load html, find a tag and add an attribute to it, before showing it.
I have tried:
libxml_use_internal_errors(true);
$domDocument->loadHTML("<html><body>Test<br></body></html>");
$domElement = $domDocument->getElementsByTagName('body');
foreach ($domElement as $formula) {
$formula->nodeValue->createAttribute('name')->value = 'attributevalue';
}
libxml_use_internal_errors(false);
But I have this error:
Call to a member function createAttribute() on string
Do you have a solution please ?
Best regards

The nodeValue returns the String type, which isn't the way to create an attribute.
And in fact, the node's type is DOMElement, so you need to set the attribute as similar as the following code:
<?php
$domDocument = new DOMDocument();
$domDocument->loadHTML("<html><body>Test<br></body></html>");
$domElement = $domDocument->getElementsByTagName('body');
foreach ($domElement as $formula) {
$formula->setAttribute("name", "attributevalue");
}
?>

Try it like this
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
$domElements = $doc->getElementsByTagName('body');
foreach ($domElements as $domElement) {
$domAttribute = $doc->createAttribute('name');
$domAttribute->value = 'attributevalue';
$domElement->appendChild($domAttribute);
print_r($domElement->getAttribute('name'));
// returns attributevalue
}
libxml_use_internal_errors(false);
Try it out #PHP-Sandbox

Here is a possible solution where setAttribute is used instead of create. Though I am not sure the purpose of the loop since there is usually only 1 body tag.
libxml_use_internal_errors(true);
$domDocument->loadHTML("<html><body>Test<br></body></html>");
$domElement = $domDocument->getElementsByTagName('body');
foreach ($domElement as $formula) {
$formula->setAttribute('name', 'thevalue');
}
libxml_use_internal_errors(false);

Xpath nodeValue/textContent unable to see <BR> tag

HTML is as follows:
ABC<BR>DEF
However, both nodeValue and textContent attributes show "ABCDEF" as the value.
Any way to show or parse the <BR>?

Maybe this'll help you: DOMNode::C14N
It'll return the HTML of the node.
<?php
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
#$doc->loadHTML($a);
$finder = new DomXPath($doc);
$nodes = $finder->query("//a");
foreach ($nodes as $node) {
var_dump($node->c14n());
}
Demo

I know you have already solved your problem, but I wanted to add a more direct way of solving it...
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
$doc->loadHTML($a);
$xp = new DomXPath($doc);
$nodes = $xp->query("//a/node()");
$text = '';
foreach ($nodes as $node) {
$text .= $doc->saveHTML($node);
}
echo $text;
Outputs...
ABC<br>DEF

How to query a DOMNode using XPath in PHP?

I'm trying to get the bing search results with XPath. Here is my code:
$html = file_get_contents("http://www.bing.com/search?q=bacon&first=11");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//li[#class='b_algo']") as $node)
{
//$output[] = $node->getAttribute("href");
$tmpDom = new DOMDocument();
$tmpDom->loadHTML($node);
$tmpDP = new DOMXPath($tmpDom);
echo $tmpDP->query("//div[#class='b_title']//h2//a//href");
}
return $output;
This foreach iterates over all results, all I want to do is to extract the link and text from $node in foreach, but because $node itself is an object I can't create a DOMDocument from it. How can I query it?

First of all, your XPath expression tries to match non-existant href subelements, query #href for the attribute.
You don't need to create any new DOMDocuments, just pass the $node as context item:
foreach ($x->query("//li[#class='b_algo']") as $node)
{
var_dump( $x->query("./div[#class='b_title']//h2//a//#href", $node)->item(0) );
}
If you're just interested in the URLs, you could also query them directly:
foreach ($x->query("//li[#class='b_algo']/div[#class='b_title']/h2/a/#href") as $node)
{
var_dump($node);
}

DOM Parser grabbing href of <a> tag by class="Decision"

I'm working with a DOM parser and I'm having issues. I'm basically trying to grab the href within the tag that only contain the class ID of 'thumbnail '. I've been trying to print the links on the screen and still get no results. Any help is appreciated. I also turned on error_reporting(E_ALL); and still nothing.
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;
I also tried this but still had the same result of NOTHING:
include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;

You were almost there:
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(#class),' '),' thumbnail ')]");
var_dump($hrefs);
Gives:
class DOMNodeList#28 (1) {
public $length =>
int(25)
}
25 matches, I'd call it success.

This code would probably work:
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[#class="thumbnail"]');
foreach($hyperlinks as $hyperlink) {
echo $hyperlink->getAttribute('href'), '<br>;'
}

if you're using simple_html_dom, why are you doing all these superfluous things? It already wraps the resource in everything you need -- http://simplehtmldom.sourceforge.net/manual.htm
include('simple_html_dom.php');
// set up:
$html = new simple_html_dom();
// load from URL:
$html->load_file('http://www.reddit.com/r/funny');
// find those <a> elements:
$links = $html->find('a[class=thumbnail]');
// done.
echo $links;

Tested it and made some changes - this works perfect too.
<?php
// load the url and set up an array for the links
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$links = array();
// loop thru all the A elements found
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$class = $link->getAttribute('class');
// Check if the URL is not empty and if the class contains thumbnail
if(!empty($url) && strpos($class,'thumbnail') !== false) {
array_push($links, $url);
}
}
// Print results
print_r($links);
?>

get value of <h2> of html page with PHP DOM?

I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?

if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2

You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>

You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

extracting and printing an html element by it's class using DOMDocument - php

Related

Create attribute in PHP by selecting tag name

Xpath nodeValue/textContent unable to see <BR> tag

How to query a DOMNode using XPath in PHP?

DOM Parser grabbing href of <a> tag by class="Decision"

get value of <h2> of html page with PHP DOM?

Categories

Resources