confused with xpath - php

I've got this PHP code loading in some html.
$dom = new DOMDocument();
$dom->loadHTML($somehtml);
$xpath = new DOMXPath($dom);
$divContent = $xpath->query('//table[class="defURLP"]');
echo $divContent;
I'm too confused to understand quite what needs to go on here, however my desire would it to be able to populate the variable $divContent to have the html contents of the table with the classname defURLP
It's currently just returning
object(DOMNodeList)#3 (0) { }

You need to retrieve the first item from the DOMNodeList returned by your xpath query, since there may be more than one in the list.
// Queries for tables having class defURLP
$tables = $xpath->query('//table[class="defURLP"]');
// Reference the first one in $divContent
$divContent = $tables->item(0);
// Output its nodeValue
echo $divContent->nodeValue;
Or iterate over the node list with a foreach:
$tables = $xpath->query('//table[class="defURLP"]');
// Iterate over the whole node list in $tables (if it is multiple nodes)
foreach ($tables as $t) {
echo $t->nodeValue;
}

Related

DOMXPath object value omitted

I read many stackoverflow question and I'm using this code but I don't know why this is not work.
Here is a code.
$url = 'http://m.cricbuzz.com/cricket-schedule';
$source = file_get_contents($url);
$doc = new DOMDocument;
#$doc->loadHTML($source);
$xpath = new DOMXPath($doc);
$classname = "list-group";
$events = $xpath->query("//*[contains(#class, '$classname')]");
var_dump($xpath);
Can you please check it why this is not working actually I want to get data from list-group
The code is correct. It correctly fetches a list of DOM nodes having the specified class attribute value into the $events variable:
$events = $xpath->query("//*[contains(#class, '$classname')]");
which is an instance of DOMNodeList. Next you should iterate the list and fetch the data you need from $events. For example, if you need the outer HTML for the nodes, use something like this:
foreach ($events as $e) {
printf("<<<<<\n%s\n>>>>>\n", $e->ownerDocument->saveXML($e));
}
P.S.: I would rename $events to $elements.

Get just the first item with DOMDocument in PHP

I am using this below code to get the elements that are in special HTML element :
$dom = new DOMDocument();
#$dom->loadHTML($google_html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//span[#class="st"]');
foreach ($tags as $tag) {
echo $node_value;
}
Now, the problem is that, the code gives all of the elements that are in one special class, but i just need to get the First item that has that class name.
So i don't need using foreach loops.
How to use that code to get JUST the FIRST item ?
The following will make sure you get just the first one in the DOMNodeList that is returned
$xpath->query('//span[#class="st"][1]');
The following gets the only item in the DOMNodeList
$tags = $xpath->query('//span[#class="st"][1]');
$first = $tags->item(0);
$text = $first->textContent;
See XPath: Select first element with a specific attribute

PHP XPath Table elements disapearing

I have just learned about XPath and I am wanting to read data from only certain columns in a table.
My current code looks like this:
<?php
$file_contents = file_get_contents('test.html');
$dom_document = new DOMDocument();
$dom_document->loadHTML($file_contents);
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("//tr[#class='rowstyle']");
if (!is_null($elements)) {
foreach ($elements as $element)
{
echo $element->nodeValue . '<br />';
}
}
else
{
echo 'none';
}
?>
Also a variation in the query because through my research I have seen lots of issues with nest table elements but it produces the same result:
$elements = $dom_xpath->query("//table[#class='tablestyle']/tbody/tr[#class='rowstyle']");
It does grab the row of data but it makes into a single string, combining all of the cells into one string and making the tags disappear.
What I really want to do is separate those cells and grab the certain row number.
I am also curious on how to find out which version of XPath I have... My PHP version is 5.3.5
Its not combining those cells... youre outputting the nodeValue which in this case is behaving like innerHTML. IF you want to work on the cells themselves the either use childNodes or a xpah query using the row as the context, then loop over the cells.
Example:
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("//tr[#class='rowstyle']");
foreach ($elements as $element)
{
foreach($element->childNodes as $cell) {
echo $cell->nodeValue . '<br />';
}
}

DOMNodeList, xPath and PHP

I am parsing an HTML page with DOM and XPath in PHP.
I have to fetch a nested <Table...></table> from the HTML.
I have defined a query using FirePath in the browser which is pointing to
html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table
When I run the code it says DOMNodeList is fetched having length 0. My objective is to spout out the queried <Table> as a string. This is an HTML scraping script in PHP.
Below is the function. Please help me how can I extract the required <table>
$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";
getExchangeRateTable($pageUrl);
function getExchangeRateTable($url){
$htmlTable = "";
$xPathTable = nulll;
$xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";
if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}
// initialize objects
$page = tidyit($url);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
// $elements is sppearing as DOMNodeList
$elements = $xpath->query($xPathQuery1);
// print_r($elements);
foreach($elements as $e){
$e->firstChild->nodeValue;
}
}
have you try like this
$dom = new domDocument;
$dom->loadHTML($tes);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName("tr");
print_r($rows);
Remove the tbody's from your XPath query - they are in most cases inserted by your browser, as is with the page you are trying to scrape.
/html/body/table[2]/tr/td[2]/table[2]/tr/td/table
This will most likely work.
However, its probaly more safe to use a different XPath. Following XPath will select the first th based on it's textual content, then select the tr's parent - a tbody or table:
//th[contains(text(),'Currency Name')]/parent::tr/parent::*
The xpath query should be with a leading / like :-
/html/...

Finding number of nodes in PHP, DOM, XPath

I am loading HTML into DOM and then querying it using XPath in PHP. My current problem is how do I find out how many matches have been made, and once that is ascertained, how do I access them?
I currently have this dirty solution:
$i = 0;
foreach($nodes as $node) {
echo $dom->savexml($nodes->item($i));
$i++;
}
Is there a cleaner solution to find the number of nodes, I have tried count(), but that does not work.
You haven't posted any code related to $nodes so I assume you are using DOMXPath and query(), or at the very least, you have a DOMNodeList.
DOMXPath::query() returns a DOMNodeList, which has a length member. You can access it via (given your code):
$nodes->length
If you just want to know the count, you can also use DOMXPath::evaluate.
Example from PHP Manual:
$doc = new DOMDocument;
$doc->load('book.xml');
$xpath = new DOMXPath($doc);
$tbody = $doc->getElementsByTagName('tbody')->item(0);
// our query is relative to the tbody node
$query = 'count(row/entry[. = "en"])';
$entries = $xpath->evaluate($query, $tbody);
echo "There are $entries english books\n";

Categories