How to get title from a website by xpath?

How to get title from a website by xpath? - php

I am playing around with xpath, but have no Idea how to for example get a title from a website using xpath, here is my code but I don't know what to do next...
$dom = new DOMDocument();
$dom->loadHTMLFile("http://www.cool.de");
$x=new DOMXPath($dom);
$result = $x->query("//TITLE");
//...???
and print_r($result) shows me only "Object", is there a function like print_r to see what is inside an object so I don't have to guess?

$result is a DOMNodeList
echo $result->item(0)->textContent
Edit: xpath is case sensitive - dom nodes must be lower case:
echo $x->query('//title')->item(0)->textContent
This now works

Related

trouble converting xml to somehting usable in php

I am getting a xml response from doing:
$foo = $client->__doRequest (parameters here)
when I echo out $foo I get the xml exactly as I'm told I should. The problem is now I want to extract some values from the xml. Now the easiest way I can see to do that is to convert it to a php array and then is super simple to get value and do lots of lovely stuff with but I seem to be having trouble doing this. Have seen a lot of examples using simple_load_xml but all I get is 'Notice: Array to string conversion in'. When I var_dump '$foo' I get 'string 'xml' '.
What am I doing wrong?

As suggested by #CD001 I persevered with DOMDocument and figured it out in the end with the following code:
$dom = new DOMDocument;
$dom->loadXML($xml);
$things = $dom->getElementsByTagName('chocolate');
/** I only had a single result so had to do it this way rather then a loop**/
if($things->length > 0) {
$node = $things->item(0);
$chocolate = $node->nodeValue;
}
else {
// empty result set
}
echo $chocolate;
bah! JSON is so much nicer...

Use Xpath:
$dom = new DOMDocument;
$dom->loadXML($xml);
$xpath = new DOMXpath($dom);
// get content of the first chocolate element node as a string
$chocolate = $xpath->evaluate('string(//chocolate)');
echo $chocolate;

DOMElement empty nodeValue

I have a project where I need to parse a xml page and pick out some data. The domDocument class seems perfect and I tried a few basic tests to see if it would do what I wanted.
Here is my code for the moment:
$dom = new domDocument;
$html = file_get_contents('http://wadmag.com/feed.xml');
$previous_value = libxml_use_internal_errors(TRUE);
$dom->loadHTML("$html");
libxml_clear_errors(); //This here is to clear the errors caused by the page not
libxml_use_internal_errors($previous_value); // being proper html
$links = $dom->getElementsByTagName('item');
echo "Found : ".$links->length. " items";
foreach ($links as $link) {
echo $link->nodeValue."<br>";
}
Now the problem is that when I load the page, I get the message "Found: 21 items", meaning that the getElementsByTagName returned a list, but when I try to display the contents of the list, nothing is displayed, as if the nodeValue was empty.
The even weirder thing is that if I replace "link" in the getElementsByTagName by title or description, it displays everything as it should. Can't seem to understand why, the only difference I can see is that and might be proper html whereas is not.

If you parse XML, use $dom->loadXML($response) instead of $dom->loadHtml($response)

PHP DOMXpath not picking anything up

I'm trying to write a script that grabs the URL of the first image from this website: http://www.slothradio.com/covers/?adv=&artist=pantera&album=vulgar+display+of+power
Here's my script:
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='album0']/img");
echo '<pre>';print_r($elements);exit;
When I run that, it outputs
DOMNodeList Object
(
)
Even when I change my query to $xpath->query("*/img"), I still get nothing. What am I doing wrong?

$doc->loadHTMLFile($content); takes in FILE PATH not HTML content see documentation
http://php.net/manual/en/domdocument.loadhtmlfile.php
Use
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
To Output Element use
var_dump(iterator_to_array($elements));
//Or
print_r(iterator_to_array($elements));
Thanks
:)

What am I doing wrong?
You are using print_r, but DOMNodeList does not offer any output for that function (because it's an internal class). You can start with outputting the number of items for example. In the end you need to iterate over the node list and deal with each node on your own.
printf("Found %d element(s).\n", $elements->length);

PHP DOMDocument getting Attribute of Tag

Hello I have an api response in xml format with a series of items such as this:
<item>
<title>blah balh</title>
<pubDate>Tue, 20 Oct 2009 </pubDate>
<media:file date="today" data="example text string"/>
</item>
I want to use DOMDocument to get the attribute "data" from the tag "media:file". My attempt below doesn't work:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('media:file');
$linkthumb = $nodes->item(0)->getAttribute('data');
}
What am I doing wrong? Please help.
EDIT: I can't leave comments for some reason Mark. I get the error
Call to a member function getAttribute() on a non-object
when I run my code. I have also tried
$nodes = $feeditem->getElementsByTagNameNS('uri','file');
$linkthumb = $nodes->item(0)->getAttribute('data');
where uri is the uri relating to the media name space(NS) but again the same problem.
Note that the media element is of the form not I think this is part of the problem, as I generally have no issue parsing for attibutes.

The example you provided should not generate an error. I tested it and $linkthumb contained the string "example text string" as expected
Ensure the media namespace is defined in the returned XML otherwise DOMDocument will error out.
If you are getting a specific error, please edit your post to include it
Edit:
Try the following code:
$xmldoc = new DOMDocument();
$xmldoc->load('api response address');
foreach ($xmldoc->getElementsByTagName('item') as $feeditem) {
$nodes = $feeditem->getElementsByTagName('file');
$linkthumb = $nodes->item(0)->getAttribute('data');
echo $linkthumb;
}
You may also want to look at SimpleXML and Xpath as it makes reading XML much easier than DOMDocument.

Alternatively,
$DOMNode -> attributes -> getNamedItem( 'MyAttribute' ) -> value;

Finding number of nodes in PHP, DOM, XPath

I am loading HTML into DOM and then querying it using XPath in PHP. My current problem is how do I find out how many matches have been made, and once that is ascertained, how do I access them?
I currently have this dirty solution:
$i = 0;
foreach($nodes as $node) {
echo $dom->savexml($nodes->item($i));
$i++;
}
Is there a cleaner solution to find the number of nodes, I have tried count(), but that does not work.

You haven't posted any code related to $nodes so I assume you are using DOMXPath and query(), or at the very least, you have a DOMNodeList.
DOMXPath::query() returns a DOMNodeList, which has a length member. You can access it via (given your code):
$nodes->length

If you just want to know the count, you can also use DOMXPath::evaluate.
Example from PHP Manual:
$doc = new DOMDocument;
$doc->load('book.xml');
$xpath = new DOMXPath($doc);
$tbody = $doc->getElementsByTagName('tbody')->item(0);
// our query is relative to the tbody node
$query = 'count(row/entry[. = "en"])';
$entries = $xpath->evaluate($query, $tbody);
echo "There are $entries english books\n";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to get title from a website by xpath? - php

$result is a DOMNodeList echo $result->item(0)->textContent Edit: xpath is case sensitive - dom nodes must be lower case: echo $x->query('//title')->item(0)->textContent This now works

Related

trouble converting xml to somehting usable in php

DOMElement empty nodeValue

PHP DOMXpath not picking anything up

PHP DOMDocument getting Attribute of Tag

Finding number of nodes in PHP, DOM, XPath

Categories

Resources