PHP with DOM can not get the description correctly - php

I am using the following code to get the first of a page. However I can not get it. What am I missing here ?
$doc = new DOMDocument();
$doc->loadhtmlfile("");
$xpath = new DOMXpath($doc);
$descr = $xpath->query('//div[#class="description"]');
print_r($descr);

query() returns a DOMNodeList, to get the <div> DOMNode, you need to get it from the list:
$descr = $xpath->query('//div[#class="description"]')->item(0);
Now, $descr contains a DOMNode of the first <div> with class description.

Related

retrieving certain attributes using DOMDocument

I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.
<?php
$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&', $content);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);
$form = $dom->getElementsByTagName('form');
print_r($form);
I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The # in #action indicates that we're selecting by attribute.
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/#action')->item(0);
var_dump($action);
Similarly, to get the first input
$action = $xpath->query('//form/input')->item(0);
To get all input fields
for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
$label = $xpath->query('//form/input')->item($i);
var_dump($label);
}
If you're not familiar with XPath, I recommend viewing these examples.

DOMXpath query returns null

$doc = new DOMDocument();
$doc->loadHTMLFile("https://www.tipico.com/en/wettschein/bslc-bVysdHEpshHRDMQ7E-Y5Q%3D%3D/");
$xpath = new DOMXpath($doc);
$footer = $xpath->query("//div[#class='t_foot']/div[1]/div[1]");
var_dump($footer->item(0)->nodeValue);
Shouldn't this return 48,37? I have other xpath queries which are working, but especially this is not.
The problem is that t_foot is not the only class on the element you are trying to get, so the class name is not equal to the string t_foot. Instead you should select element which class contains t_foot. So XPath expression should be this:
$footer = $xpath->query('//div[contains(#class, "t_foot")]/div[1]/div[1]');

How to get full HTML from DOMXPath::query() method?

I have document from which I want to extract specific div with it's untouched content.
I do:
$dom = new DOMDocument();
$dom->loadHTML($string);//that's HTML of my document, string
and xpath query:
$xpath = new DOMXPath($dom);
$xpath_resultset = $xpath->query("//div[#class='text']");
/*I'm after div class="text"*/
now I do item(0) method on what I get with $xpath_resultset
$my_content = $xpath_resultset->item(0);
what I get is object (not string) $my_content which I can echo or settype() to string, but as result I get is with fully stripped markup?
What to do to get all from div class='text' here?
Just pass the node to the DOMDocument::saveHTML method:
$htmlString = $dom->saveHTML($xpath_resultset->item(0));
This will give you a string representation of that particular DOMNode and all its children.

PHP DOMXpath not picking anything up

I'm trying to write a script that grabs the URL of the first image from this website: http://www.slothradio.com/covers/?adv=&artist=pantera&album=vulgar+display+of+power
Here's my script:
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='album0']/img");
echo '<pre>';print_r($elements);exit;
When I run that, it outputs
DOMNodeList Object
(
)
Even when I change my query to $xpath->query("*/img"), I still get nothing. What am I doing wrong?
$doc->loadHTMLFile($content); takes in FILE PATH not HTML content see documentation
http://php.net/manual/en/domdocument.loadhtmlfile.php
Use
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
To Output Element use
var_dump(iterator_to_array($elements));
//Or
print_r(iterator_to_array($elements));
Thanks
:)
What am I doing wrong?
You are using print_r, but DOMNodeList does not offer any output for that function (because it's an internal class). You can start with outputting the number of items for example. In the end you need to iterate over the node list and deal with each node on your own.
printf("Found %d element(s).\n", $elements->length);

How to cut off a portion of a html inside <div> and store it as html string by using xpath and domdocument?

I would like to cut off some portion of html, I can take it by using XPath and DomDocument but the problem is that I need result as a html code string. Normally I would use reg. expr. for that but I wouldn't like to do a complicated search pattern that would mach the begining and the end of tag.
That's the example input:
some html code before
<div>this <b>is</b> what I want</div>
some html after
and the output:
<div>this <b>is</b> what I want</div>
I tried something like this:
subject = 'some html code before
<div>this <b>is</b> what I want</div>
some html after';
$doc = new DOMDocument();
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div/*");
echo $result->saveHTML();
but i got only error:
Call to undefined method DOMNodeList::saveHTML()
Does anyone know how to get the result as a html string by using DomDocument and XPath?
Thank you Gentleman for pointing out my missunderstanding with accessing methods that are not aviailable in a child object. But line:
echo $doc->saveHTML($result->item(0));
generates only warning (without the html sting I want to have). Luckily I found another soulution and here it is:
<?php
$subject = '<html>
<head>
<title>A very short ebook</title>
<meta name="charset" value="utf-8" />
</head>
<body>
<h1 class="bookTitle">A very short ebook</h1>
<p style="text-align:right">Written by Kovid Goyal</p>
<div class="introduction">
<p>A very short ebook to demonstrate the use of XPath.</p>
</div>
<h2 class="chapter">Chapter One</h2>
<p>This is a truly fascinating chapter.</p>
<h2 class="chapter">Chapter Two</h2>
<p>A worthy continuation of a fine tradition.</p>
</body>
</html>';
$doc = new DOMDocument();
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div");
//echo $doc->saveHTML($result->item(0));
echo domNodeList_to_string($result);
function domNodeList_to_string($DomNodeList) {
$output = '';
$doc = new DOMDocument;
while ( $node = $DomNodeList->item($i) ) {
// import node
$domNode = $doc->importNode($node, true);
// append node
$doc->appendChild($domNode);
$i++;
}
$output = $doc->saveHTML();
$output = print_r($output, 1);
// I added this because xml output and ajax do not like each others
//$output = htmlspecialchars($output);
return $output;
}
php>
so if one has a query like that:
$result = $xpath->query("//div");
then will get the raw html string output:
<div class="introduction">
<p>A very short ebook to demonstrate the use of XPath.</p>
</div>
if the query is:
$result = $xpath->query("//p");
then output will be:
<p style="text-align:right">Written by Kovid Goyal</p><p>A very short ebook to demonstrate the use of XPath.</p><p>This is a truly fascinating chapter.</p><p>A worthy continuation of a fine tradition.</p>
Does anyone know simpler (embeded in php) method to get the same result?
Try this:
$subject = 'some html code before
<div>this <b>is</b> what I want</div>
some html after';
$doc = new DOMDocument();
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div");
echo $doc->saveHTML($result->item(0)); //echoes what you want :)
The saveHTML function belongs to the DOMDocument object, you can't call it directly on the node (much less on the NodeList, which is what the query returns), but what you can do is pass it the node as a param.
Also, your query was wrong: what you want is the div element (i.e. //div), not its children (//div/*).
As per the php manual docs on DOMXPath::querydocs, the function:
Returns a DOMNodeList containing all nodes matching the given XPath
expression. Any expression which does not return nodes will return an
empty DOMNodeList.
This means that the $result in the following code will be a DOMNodeListdocs object. So if you want to get individual HTML code out from inside it you'll need to use methods available with a DOMNodeList object. In this case, the item method:
$result = $xpath->query("//div");
echo $doc->saveHTML($result->item(0));
$result->item(0) returns the first DOMNode in the DOMNodeList created by your xpath query.
Try this :
$subject = 'some html code before<div>this <b>is</b> what I want</div>some html after';
$doc = new DOMDocument('1.0');
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div");
$docSave = new DOMDocument('1.0');
foreach ( $result as $node ) {
$domNode = $docSave->importNode($node, true);
$docSave->appendChild($domNode);
}
echo $docSave->saveHTML();

Categories